bupstash
bupstash copied to clipboard
s3 storage
This is something that has been a private WIP.
- Performance so far is good on some s3 providers, absolutely horrible on others.
- The fix seems like it will be extremely deep parallel fetch pipelining.
- Want something that we can provide as a service on bupstash.io.
- Want to allow people to run it themselves if they have their own cloud setup.
I'm on my 3rd implementation of this now, I can never quite get it right. A big problem is I want to strongly resist pulling async into bupstash.
query for S3 backend design and implications to using Glacier/DEEP_ARCHIVE:
Is there strong separation between all data and metadata files in the storage engine?
The repo layout file at https://bupstash.io/doc/man/bupstash-repository.html doesn't make it clear if the tar content listing is in items/ or data/.
This would be crucial to make it possible to put the metadata into storage w/ low per-access costs & low latency, while pushing the data to much cheaper storage (if I need my backup restored, I can wait 12 hours for the S3 restoreObject command to complete)
The content listing is stored in data/ , Splitting the tiers is something I have considered and may add in a future release, though s3 also supports automatic intelligent access tiers which are another alternative.
S3 Intelligent tiering ends up worst possible pricing for backup media w/ known workloads. It doesn't immediately put most content into the DEEP_ARCHIVE storage class where it could be.
Splitting the listing would absolutely be needed then since content listings are in data/. As an alternative, making it possible to have multiple repos which don't have all of the data: e.g. some local store that keeps only last 7 days, plus also the Glacier storage that has years of backups.