versitygw icon indicating copy to clipboard operation
versitygw copied to clipboard

Object Versioning for posix/scoutfs

Open benmcclelland opened this issue 7 months ago • 7 comments

Describe the solution you'd like We would like to optionally support object versioning compatible with AWS S3. The following requirements/behaviors are expected:

  • Only enabled when specifically configured
  • Storing object versions will consume filesystem capacity to store each version of an object. It is possible some filesystems can de-duplicate the data extents, but that is outside of the scope of the gateway and the gateway wont do anything specific to enable this.
  • Versioning can be enabled/disabled per bucket once configured for the gateway, defaulting to disabled

Objectives Versioning behavior compatible to AWS S3 when enabled. AWS documentation can be found here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html

Design To enable, a directory should be configured for where to store the non-current object versions. The older object versions should not be stored within the gateway root namespace to prevent confusion when accessing the namespace outside of S3. When deleting or uploading an existing object, the older version can be moved to the version directory. If the version directory is within the same filesystem, then the move will likely happen fast not needing to re-write all the file data. If it is not within the same filesystem, then the move will have to copy all file data to the new location. This is handled automatically in file renaming.

Version Namespace The directory structure for the older object versions does not need to be a compatible namespace with posix filenames like the primary namespace does. The easiest namespace for these would be based on a sha256 hash of the object name, and creating a small directory structure with that name. The top level directory will still need to be the bucket to prevent collisions across buckets. To be nicer to posix filesystems and not have all objects in the same directory, we can split the object name hash into directories based on the first few bytes of the hash. This is a common tactic in other projects. For example,

bucket: mybucket object: dir1/dir2/myobject
sha256("dir1/dir2/myobject") = cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100

location of version "1":

<version directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/1

Version IDs Each object version in the version namespace has an ID associated with it in AWS that uniquely identifies that object version. We can explore a few options here:

  • timestamp
  • counter
  • random string
  • uuid There needs to be some consideration in ordering when listing results. The list object versions probably expects results in time order?

Delete Markers When an object is deleted, the current object gets moved to versioning and a new empty object gets placed in the primary namespace with a delete marker attribute indicating that this object shouldn't be listed or retrieved (as it was deleted). But older versions can still be restored to replace the delete marker object. We will likely just add a new xattr to signify that the file is a delete marker, and handle this accordingly in the listing walks.

list-object-versions We need to enable listing of the object versions as well as objects when list-object-versions called. This can be handled in the listing walk function to look into the version namespace for each object visited. The walk function results may need to be modified for handling versioning.

RFC This is intended to be RFC style open to comments. Any requirement changes or design change proposals can be discussed in issue comments.

benmcclelland avatar Jul 16 '24 17:07 benmcclelland