b2-sdk-python Bucket to bucket "sync" of all versions

Currently, the SDK is able to synchronize files between two B2 buckets (implemented in #165), but it synchronizes only the latest versions as the whole idea of synchronization works on files and not on file versions.

We may consider adding a feature to be able to sync every version of the files. It may not by b2 sync and something else, or a special b2 sync mode.

Oct 28 '20 07:10 mlech-reef

sync already has options to filter file versions, both server-side file versions may be filtered out and the client-reported times can also be used (not sure if all of that is implemented right now, but you get the idea). The reason for this is for example a backup uploading encrypted garbage during a ransomware/cryptolocker attack. Being able to restore a bucket from such situation by cloning it out to a fresh bucket (mostly using server-side copy!) might be a good option, especially if some files were not encrypted yet and new versions of them were backed up. It's all up to the user, really, how they will deal with this, but the tools should be there.

As sync already has so many options to tweak behavior of massive from/to bucket synchronization operations, I think we should add a switch to b2 sync (--mode=versions?) rather than create a new command, which would need to get most of b2 sync parameters anyway.

Oct 28 '20 11:10 ppolewicz

I agree that this would be a useful feature. Not sure of the priority.

What, exactly, would this feature do? It will not be able to replicate the upload times of the original files; they upload times will be the times the files were copied. It can preserve the metadata, including the file modification time. It can preserve the order of the versions of a file.

Because the upload times of the file versions in the destination bucket are different, the actions of lifecycle rules will be different in the source and destination buckets.

Oct 28 '20 16:10 bwbeach

This would, among other things, let users of old buckets clone them into new ones so that they could use the S3 interface with their data.

The server-side times will change, yes, but sync uses modification time.

It seems that to enable this we need just one or two functions (to iterate over every file version in the bucket instead of just the most recent version of every file in that bucket) (respecting filters that we already have).

We might specifically not copy the lifecycle rules to force the user to re-apply them appropriately.

Oct 28 '20 17:10 ppolewicz

b2-sdk-python b2-sdk-python copied to clipboard

Bucket to bucket "sync" of all versions

b2-sdk-python
b2-sdk-python copied to clipboard