restic
restic copied to clipboard
Document how expensive various operations are
restic has support for a great number of cloud backup hosts (thanks!). They charge by GB (so the price is fairly easy to estimate), but they also charge for various operations (e.g. list file). Without any knowledge about how restic works, it's hard for users to estimate how many and what kind of operations restic will use (and what that may cost).
I'm sure much depends on your actual usage of restic and I guess different cloud providers classify different operations in different ways (class A, B, C), so it's hard to give a general answer.
But maybe the manual could in some rough / generic details describe what kind of operations or number of operations to expect. Again, maybe this is too difficult or too cloud provider specific, but for example it seems that check is fairly "expensive" whereas a normal backup seems fairly inexpensive.
I guess it also depends on the nature of your backup, e.g. a lot of small files vs some big files.
I'm wondering if something about this could be added to the documentation.
(If you think this doesn't make sense or would be too complicated or too specific to a specific cloud provider/backend, feel free to close. Maybe the answer is "get a free B2 account and play around with a small backup", which is what I'm doing right now).
Hi,
I think this probably be better suited in the forum. That way there's a conversation that happens, allowing others to freely write what providers they use and why.
:wq sean
I agree the forum is the right place to discuss practical experience, but I feel some information might be useful in the manual. e.g. saying that "backup" is cheap while "check" is expensive.
Also, as a user I don't really care about the internals of restic since I trust restic and @fd0, but obviously this has an impact on cloud backup costs. e.g. if I backup 10,000 small files, will that result in 10,000 put requests or will restic pack all of that into a single pack (like Git does afaik) and do a single put request.
I like the suggestion, we should document this on a rather abstract level, i.e. "restic snapshots lists all files in the snapshots/ dir, then reads each file". This hides the actual implementation details (so you still won't know which B2 API call is used), but users can then get a feeling for how complex each operation is.
Sounds good to me. Thanks!
Unsure if this is related enough, or warrants a new issue. But, it would also be nice to document which files are "add only", and which files get replaced. This is useful, for example, when coming up with rules in s3 to automatically move the data files to glacier storage.
Might as well scratch my previous comment. Restic doesn't really work with S3 when migrating data to Glacier or Deep Archive, since Restic needs to be able to read from older data files when doing new backups. I'll just switch back to backing up to a directory, and then syncing that to S3.
As an alternative to documentation would it perhaps be an idea to add a mode in which Restic tells how many and which calls it made to a cloud backend (ie. LIST, GET, PUT,etc.)? Basically just keep a counter for the specific cloud operations and display those counters after command completion (when for example a special flag has been passed to do so)? That way one can also monitor it with something like Prometheus. As an added bonus the user can also see the numbers for their situation.
I think keeping documentation in sync with the actual implementation might prove hard.
I'm on the same track as @siepkes. It would provide limited value/facts to users if we just describe in general terms what's going on and what operations are performed. The user would still have to count and calculate a lot of things to get any actually useful conclusion from it in terms of knowing how expensive it will be on their block storage.
So a better solution would probably be to add some counting of operations and when the verbosity is raised to some specific level (perhaps not the first one), one could give the user hard facts for the various relevant commands. We could show them how many reads, writes, replaces, amounts of data, etc have been done. This information would be more directly useful for the use case at hand.
A specific stats command or --stats argument is probably unnecessary, better add it to the argument we already have, namely --verbose/ -v.
That said, for now it would possibly be useful to just document the operations in some way that can be relevant for users. Assuming this is actually relevant - how many users need this information? Won't most of them just try it and see what they get on their bill?