bupstash s3 storage

This is something that has been a private WIP.

Performance so far is good on some s3 providers, absolutely horrible on others.
The fix seems like it will be extremely deep parallel fetch pipelining.
Want something that we can provide as a service on bupstash.io.
Want to allow people to run it themselves if they have their own cloud setup.

Jan 13 '21 12:01 andrewchambers

I'm on my 3rd implementation of this now, I can never quite get it right. A big problem is I want to strongly resist pulling async into bupstash.

Jan 15 '21 11:01 andrewchambers

query for S3 backend design and implications to using Glacier/DEEP_ARCHIVE:

Is there strong separation between all data and metadata files in the storage engine?

The repo layout file at https://bupstash.io/doc/man/bupstash-repository.html doesn't make it clear if the tar content listing is in items/ or data/.

This would be crucial to make it possible to put the metadata into storage w/ low per-access costs & low latency, while pushing the data to much cheaper storage (if I need my backup restored, I can wait 12 hours for the S3 restoreObject command to complete)

Sep 26 '21 18:09 robbat2

The content listing is stored in data/ , Splitting the tiers is something I have considered and may add in a future release, though s3 also supports automatic intelligent access tiers which are another alternative.

Sep 26 '21 23:09 andrewchambers

S3 Intelligent tiering ends up worst possible pricing for backup media w/ known workloads. It doesn't immediately put most content into the DEEP_ARCHIVE storage class where it could be.

Splitting the listing would absolutely be needed then since content listings are in data/. As an alternative, making it possible to have multiple repos which don't have all of the data: e.g. some local store that keeps only last 7 days, plus also the Glacier storage that has years of backups.

Sep 27 '21 05:09 robbat2

bupstash bupstash copied to clipboard

s3 storage

bupstash
bupstash copied to clipboard