mt-aws-glacier
mt-aws-glacier copied to clipboard
Recovery on failure in the middle of an upload/download
Hi,
Apologies if this is explained in the doc and I missed it (in which case please point me to the right place), but what happens if an upload/download stops suddendly before it is finished? e.g. the server crashes, power cut, or someone hits CTRL-C by mistake.
WIll mt-aws-glacier handle that well, and:
- don't upload another copy of the same file, which would mean paying Amazon for more storage space than necessary / intended. I don't know if Glacier allows for such a thing, or if it would not "commit" the new file if it fails to upload fully, or if it would override the existing file if we upload it again. But I'm not keen on finding out with a big bill at the end of the month! ;)
- resume the upload/download where it stopped the next time the upload/download is attempted.
If not, is this something you would consider adding?
Failing this I would have to split my reasonnably big backup files (tens of GB) to limit the risk, which is not very convenient.
Thx, Thibault.
Hello.
When uploading:
if upload terminated in a random place, upload is not finished. next time upload will start from scratch. I think additional charges will be only chargest for requests (i.e. $0.05 for 1000 requests now). file will not be uploaded so you won't be charged for additional storage.
there can be a race-condition - after upload is finished mtglacier
will write record about it into journal file (within several milliseconds i think). if during that period process is terminated - you'll get one duplicate, untracked file in mtglacier and will pay for storage in the future.
When downloading (restore-completed
):
same, download will start from scratch. downloaded data, stored in temporaty file before crash is not reused (and left on disk if this was crash or removed from disk if it was Ctrl-C). I think yo'll pay $0.05 for 1000 requests + for bandwidth.
no other race conditions here.
For retrieving (i.e. expensive operation)
File either retrieved or no. There can be race condition (i.e. if file retrieved but this record did not reach journal due to crash within short few milliseconds info)
Two things can be improved here:
-
reuse uploads/download data to save bandwith. this is possible.
-
avoid race conditions listed above (by writing special records indicating unfinished operation, before that operation). I think it's useless for upload, because even if possible race condition detected
mtglacier
will have to wait 24+4 hours to fix that. For retrieval it's pretty useful and doable.
Also, you really can workaround race condition during upload by your self. Wait 24h (don't upload anything), retrieve-inventory
/download-inventory
, compare new journal with old, use new journal if there is extra file.
Note that when we talk about race conditions here, assuming there is no bug in software, and crash time is really random, those race conditions are really rare.
Failing this I would have to split my reasonnably big backup files (tens of GB)
if that size big for you (i.e. it's high %% of all your data). it's recommended to split to small parts, because: a) you'll have ability to pay less if retrieving during long period (see amazon pricing for retrieval) b) you can't download file if it's too big for your bandwidth. amazon downloads discarded after 24h. so you can download file only if your bandwidth allows this to do within 24h (+ count here risk of possible crash, download restart, bandwith downtime etc). so it would be safe, say, if you can download file during 6h or so.
note:
For retrieving (i.e. expensive operation) There can be race condition
race conditions apply here only if you retrieve twice. no race condition if you retrieve once + download.
quote from documentation of restore
command:
Initiate Amazon Glacier RETRIEVE oparation for files listed in Journal, which don't exist on local filesystem and for which RETRIEVE was not initiated during last 24 hours (that information obtained from Journal
quote from documentation of restore-completed
command:
Unlike restore command, list of retrieved files is requested from Amazon Glacier servers at runtime using API, not from journal.
i.e. restore-completed
takes file list from Amazon Servers.
Hi,
Thanks for the very quick feedback.
If Amazon Glacier doesn't "commit" the file until it finished uploading successfully, then the worst that can happen is that we need to start again from scratch, which takes time, but doesn't cost more (no upload per GB fee). I'm ignoring the request fee which is not really a problem for me.
And good point the file split, it does look like splitting will be a good idea in my case anyway then.
So we are left with a feature to resume downloads, which I think would be useful to avoid paying retrieval fees twice in case of crashes/failures, and a better handling of the race conditions, which again sound like a good idea to me.
Thx, Thibault.
then the worst that can happen is that we need to start again from scratch
yes, except if race condition happen.
So we are left with a feature to resume downloads, which I think would be useful to avoid paying retrieval
when you downloading you don't pay high retrieval fee. only bandwith and requests fee. retrieval fee paid when you retrieve file restore
command. after that you can download file several times with restore-completed
without paying high fee again.
which again sound like a good idea to me
yes, I will leave this ticket open as enhancement. most likely in the future I split it into several tickets.
it's unlikely that all things listed here can be implemented soon (enhancements are low priority for me, bugfixes is high, some of those enhancements are hard to implement and some are not really important - I don't think other software vendors ever care about rare race conditions)
No problem at all, it's a free software, I fully understand if you don't have the time or the will to implement some or all of what is discussed here. Thanks for all the work you have already done on this useful tool.