The requirements
Easy to scale
Replication
S3 Compatible API
Must run as an application on top of Ubuntu 12.04 LTS
Must use the storage available on a VPS
Can limit disk usage per node. (This is not an absolute requirement, but a nice-to-have requirement)
Can work on low to medium latency connections
The contenders
Riak CS
Runs on top of Riak
Riak is a database (Key/value store) and therefore runs on top of the majority of *nix based distributions
Easy to add nodes using command line tools to a Riak cluster
Has S3 compatible API
Replication is a must for a Riak cluster
No node is master/slave, all are equal (good for HA)
Has a nice web administration tool
Eucalyptus Walrus
Medium difficult to add nodes
Has addons for S3 compatible apis
Can use a normal folder for storage, but not if replication is used
Must have a block device to replicate
Easy to limit usage on a node
Difficult to configure replication
Must have a Cloud Controller, and for HA, a secondary controller
OpenStack Swift
Medium difficult to scale
Can have problems with medium latency connections, due to writing on a majority of nodes
S3 compatible API as an addon
Must have block devices
Easy to limit usage on a node
Syncs through rsync. (I never liked rsync...)
Cloudian
Has a community Edition, but documentation is sign-up only (Vmware/citrix anyone?)
Claims to be OSS, but in reality: no.
Read up on the docs, but the documentation is sparse and it does not feel "production-ready"
Apache Cloudstack
Has S3 API
Not usable as it requires a management server and a host/hypervisor system
Ceph
Is a distributed file system.
Easy to add nodes
Replicates across nodes
Does have a S3 compatible API, although some limitations (http://ceph.com/docs/next/radosgw/s3/)
Has a nice deploy-tool
Requires block devices for storage
Wrap up
Basically this gives two different directions.
Setting up Riak CS directly on the system or choosing Ceph, Walrus or Swift and setting up a file as a block device.
After reading up on the docs, I am considering both Ceph and Riak CS, and will start by testing Riak CS. The
Both provide good chef cookbooks, so for large scale deployments, use time to setup chef properly. It will save you time when you need that next node if you plan to grow.
However, this is most likely going to be more expensive than using cloud storage, so do consider if you want to use your time on this or just pay for cloud storage.
Other openstack options would work fine as well, since the client library I am using supports both.