amazon-glacier-cmd-interface
amazon-glacier-cmd-interface copied to clipboard
Handle timeouts
For some reason most of the time my uploads fail due to timeouts:
boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)
My workaround so far is to do this:
- Perform a normal upload, without
--resume
and--uploadid
. - Get the upload ID from
glacier-cmd listmultipart
- Put the command in a loop:
while true
do
glacier-cmd --resume --uploadid "D9651-5d4f..." the_other_arguments
sleep 120
done
This way when the upload timeouts it resumes automatically after 2 minutes.
(you probably want to use large part sizes with that kind of setup)
4. At some point glacier-cmd prints str: Can not resume upload of this data as no existing job with this uploadid could be found.
meaning the upload is finished.
5. Press (and hold) Ctrl-C
to get out of the loop. (or rewrite the script to detect the "success" message or maybe the return status of the command)
It would be nice if glacier-cmd handled timeouts itself. Glacier-cmd is useless to me without this workaround. Otherwise, except for trying to print hundreds of columns, from what I have seen it works pretty well.
This has always been an issue, the problem appears to be on Amazon's side - when actively coding on this project I did several attempts for automatic retries within the code, so you don't see these errors (until it got like five timeouts in a row, indicating another issue). It happens time and again, I have never been able to find a pattern with these timeout errors.
It seems like the code to retry on HTTP 408 is commented out in the main branch. I've enabled a tweaked version of it in gburca@85ef4aa but I haven't run into the 408's recently so I can't say for sure if it fixes the issue. @tiktaktok, if you want to try the patch, please enable logging. I'd be curious to know what values of "retry" and "total retries" you're seeing.
I just updated to gburca@85ef4aa and I am seeing the same errors still
This is what I see in the console
Traceback (most recent call last):e 8.85 MB/s, average 7.09 MB/s, ETA Tu
File "/usr/bin/glacier-cmd", line 9, in
I have encountered this issue quite repeatedly lately. I was able to identify is a simple little "fix" for this yesterday night and I have been uploading gigs of backlog since with no 408 Request timed out
messages.
The fix is really a configuration change for boto. Just define a [Boto]
section in your environment's configuration file and set the num_retries
to some small number. The default value happens to be None
, as in, no retries will be performed. See http://docs.pythonboto.org/en/latest/boto_config_tut.html#boto for more information about the configuration file.
I happen to have my own code written to Layer1 of Boto, and this configuration tweak works like a charm.
@pchug - I have tried using number of tries as 10, 15 , 5 , still get the same error. Could you tell the changes you have made that made it work ?
I am not using the amazon-glacier-cmd-interface, but instead a custom script that recursively handles any exception faced in upload parts and resumes from the last uploaded part. With the exception handling it does resume, only to get the 408 Request timed out error again. Once in a blue moon, it starts again, only to get interrupted after a short error free dream run.
I have used the script to upload TBs of data, and rarely did we get this error, when operating in Tokyo, and eu-west, but it is quite frequent in the Frankfurt , eu-central region.
Any workarounds for this?