gittargets
gittargets copied to clipboard
Pure AWS S3 backend
Prework
- [x] Read and agree to the Contributor Code of Conduct and contributing guidelines.
- [x] If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
- [x] New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
- [x] Format your code according to the tidyverse style guide.
Proposal
Similar to #2, but directly implemented on top of AWS S3 through something like aws.s3
, paws
, or botor
. Use the historical versioning and tagging capabilities of buckets.
~~Probably precedes #2.~~ May actually want to look at DVC first. It may already do a lot of the stuff I mention below.
Setback: an S3 object can only have up to 10 tags. Poses a problem if a target is part of more than 10 snapshots, which is likely to come up for almost all projects.
Another idea: the metadata already has hashes, which is half the battle for a key-value store.
Snapshot
- Commit
_targets/meta/meta
to a local git repo. Do not commit_targets/objects
. - For each target in
_targets/meta/meta
, upload the file in_targets/objects
an S3 bucket. In the bucket, the object name should be the hash recorded in_targets/meta/meta
. If the object already exists in the bucket, skip the upload.
Checkout
- Check out the metadata file.
- For each target in the metadata, if the hash in
_targets/meta/meta
disagrees with the actual hash of the file, attempt to find the correct hash in the bucket and download the object to_targets/objects/
.
Hopefully (2) will be possible without cloning a bunch of infrastructure from targets
.
Status
Git status of _targets/meta/meta
+ checking the hashes of _targets/meta/meta
vs _targets/objects
files vs the bucket.
Closing in favor of https://github.com/ropensci/targets/issues/711
Reopening. Relative to native AWS versioning in targets, an AWS gittargets backend would allow less frequent uploads and allow users to opt in later in the project’s life cycle.
On reflection: if you're already using AWS S3, then https://books.ropensci.org/targets/cloud-storage.html is way better.