amazon-glacier-cmd-interface icon indicating copy to clipboard operation
amazon-glacier-cmd-interface copied to clipboard

Handle timeouts

Open mathieu-clement opened this issue 10 years ago • 6 comments

For some reason most of the time my uploads fail due to timeouts:

boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)

My workaround so far is to do this:

  1. Perform a normal upload, without --resume and --uploadid.
  2. Get the upload ID from glacier-cmd listmultipart
  3. Put the command in a loop:
while true
do
    glacier-cmd --resume --uploadid "D9651-5d4f..." the_other_arguments
    sleep 120
done

This way when the upload timeouts it resumes automatically after 2 minutes. (you probably want to use large part sizes with that kind of setup) 4. At some point glacier-cmd prints str: Can not resume upload of this data as no existing job with this uploadid could be found. meaning the upload is finished. 5. Press (and hold) Ctrl-C to get out of the loop. (or rewrite the script to detect the "success" message or maybe the return status of the command)

It would be nice if glacier-cmd handled timeouts itself. Glacier-cmd is useless to me without this workaround. Otherwise, except for trying to print hundreds of columns, from what I have seen it works pretty well.

mathieu-clement avatar Feb 11 '15 17:02 mathieu-clement

This has always been an issue, the problem appears to be on Amazon's side - when actively coding on this project I did several attempts for automatic retries within the code, so you don't see these errors (until it got like five timeouts in a row, indicating another issue). It happens time and again, I have never been able to find a pattern with these timeout errors.

wvmarle avatar Feb 12 '15 06:02 wvmarle

It seems like the code to retry on HTTP 408 is commented out in the main branch. I've enabled a tweaked version of it in gburca@85ef4aa but I haven't run into the 408's recently so I can't say for sure if it fixes the issue. @tiktaktok, if you want to try the patch, please enable logging. I'd be curious to know what values of "retry" and "total retries" you're seeing.

gburca avatar May 13 '15 03:05 gburca

I just updated to gburca@85ef4aa and I am seeing the same errors still

This is what I see in the console

Traceback (most recent call last):e 8.85 MB/s, average 7.09 MB/s, ETA Tu File "/usr/bin/glacier-cmd", line 9, in load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')() File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie r.py", line 929, in main args.func(args) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie r.py", line 156, in wrapper return fn(_args, *_kwargs) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie r.py", line 309, in upload args.name, args.partsize, args.uploadid, args.resume) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie rWrapper.py", line 65, in wrapper ret = fn(_args, *_kwargs) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie rWrapper.py", line 232, in glacier_connect_wrap return func(_args, *_kwargs) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie rWrapper.py", line 65, in wrapper ret = fn(_args, *_kwargs) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie rWrapper.py", line 253, in sdb_connect_wrap return func(_args, *_kwargs) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie rWrapper.py", line 65, in wrapper ret = fn(_args, *_kwargs) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie rWrapper.py", line 1157, in upload writer.write(part) File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie rcorecalls.py", line 129, in write data) File "/usr/lib/python2.6/site-packages/boto-2.29.1-py2.6.egg/boto/glacier/layer1.py", line 1278, in upload_part response_headers=response_headers) File "/usr/lib/python2.6/site-packages/boto-2.29.1-py2.6.egg/boto/glacier/layer1.py", line 118, in make_request raise UnexpectedHTTPResponseError(ok_responses, response) boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)

hagleyj avatar May 14 '15 19:05 hagleyj

I have encountered this issue quite repeatedly lately. I was able to identify is a simple little "fix" for this yesterday night and I have been uploading gigs of backlog since with no 408 Request timed out messages.

The fix is really a configuration change for boto. Just define a [Boto] section in your environment's configuration file and set the num_retries to some small number. The default value happens to be None, as in, no retries will be performed. See http://docs.pythonboto.org/en/latest/boto_config_tut.html#boto for more information about the configuration file.

I happen to have my own code written to Layer1 of Boto, and this configuration tweak works like a charm.

pchug avatar May 19 '15 00:05 pchug

@pchug - I have tried using number of tries as 10, 15 , 5 , still get the same error. Could you tell the changes you have made that made it work ?

I am not using the amazon-glacier-cmd-interface, but instead a custom script that recursively handles any exception faced in upload parts and resumes from the last uploaded part. With the exception handling it does resume, only to get the 408 Request timed out error again. Once in a blue moon, it starts again, only to get interrupted after a short error free dream run.

I have used the script to upload TBs of data, and rarely did we get this error, when operating in Tokyo, and eu-west, but it is quite frequent in the Frankfurt , eu-central region.

AkshivBaluja avatar Mar 10 '16 05:03 AkshivBaluja

Any workarounds for this?

williamoverton avatar Sep 11 '17 15:09 williamoverton