internetarchive icon indicating copy to clipboard operation
internetarchive copied to clipboard

"Read timed out" when uploading large files; allow to set upload timeout ?

Open u1735067 opened this issue 4 years ago • 4 comments

Hello,

I used this client (with Python 3.6 installed) to upload large files to archive.org (thank you for the client) but when uploading large files I encountered this error:

# /root/ia upload --no-derive --checksum --metadata="mediatype:software" identifier bigfile.rar
identifier:

 uploading bigfile.rar:   0%|                                                                                                       | 0/6605 [00:00<?, ?MiB/s]
 uploading bigfile.rar: 100%|| 6605/6605 [05:52<00:00, 18.73MiB/s]
Traceback (most recent call last):
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib64/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/usr/lib64/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib64/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib64/python3.6/ssl.py", line 968, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 830, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 587, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.pex/install/requests-2.24.0-py2.py3-none-any.whl.9ec91e5091d964e80194e552b3c3e9388dd1f722/requests-2.24.0-py2.py3-none-any.whl/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 727, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/util/retry.py", line 410, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 428, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 336, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='s3.us.archive.org', port=443): Read timed out. (read timeout=120)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/ia/.bootstrap/_pex/pex.py", line 367, in execute
  File "/root/ia/.bootstrap/_pex/pex.py", line 293, in _wrap_coverage
  File "/root/ia/.bootstrap/_pex/pex.py", line 325, in _wrap_profiling
  File "/root/ia/.bootstrap/_pex/pex.py", line 410, in _execute
  File "/root/ia/.bootstrap/_pex/pex.py", line 468, in execute_entry
  File "/root/ia/.bootstrap/_pex/pex.py", line 486, in execute_pkg_resources
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/cli/ia.py", line 170, in main
    sys.exit(ia_module.main(argv, session))
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/cli/ia_upload.py", line 245, in main
    for _r in _upload_files(item, files, upload_kwargs):
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/cli/ia_upload.py", line 97, in _upload_files
    response = item.upload(files, **upload_kwargs)
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/item.py", line 1254, in upload
    request_kwargs=request_kwargs)
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/item.py", line 1060, in upload_file
    **request_kwargs)
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/session.py", line 541, in send
    raise e
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/session.py", line 538, in send
    reraise_modify(e, e.request.url, prepend=False)
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/utils.py", line 312, in reraise_modify
    traceback)
  File "/root/.pex/install/six-1.15.0-py2.py3-none-any.whl.8180e2276d844cb46f09eb1f9fd7d96ea769f9b5/six-1.15.0-py2.py3-none-any.whl/six.py", line 703, in reraise
    raise value
  File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/session.py", line 535, in send
    r = super(ArchiveSession, self).send(request, **kwargs)
  File "/root/.pex/install/requests-2.24.0-py2.py3-none-any.whl.9ec91e5091d964e80194e552b3c3e9388dd1f722/requests-2.24.0-py2.py3-none-any.whl/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/root/.pex/install/requests-2.24.0-py2.py3-none-any.whl.9ec91e5091d964e80194e552b3c3e9388dd1f722/requests-2.24.0-py2.py3-none-any.whl/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='s3.us.archive.org', port=443): Read timed out. (read timeout=120)",), 'https://s3.us.archive.org/identifier/bigfile.rar')

I was able to workaround it by using the Pypi version in venv (instead of the binary) and modifying https://github.com/jjjake/internetarchive/blob/v1.9.6/internetarchive/item.py#L938 to replace 120 by 600, but I see most of the code is already present (https://github.com/jjjake/internetarchive/blob/v1.9.6/internetarchive/cli/ia_upload.py#L216) to modify this on demand, so maybe this could be exposed as an option?

Best, Alexandre

u1735067 avatar Jan 22 '21 22:01 u1735067

This is weird. That timeout parameter is actually for exceeding the time between receiving data on the raw socket, and it only applies after the upload is completed and the client is trying to read a response from the HTTP connection. This would mean that IA didn't acknowledge your upload for two minutes after the transfer completed. While this can take a few seconds, I've never seen hangs anywhere near that long (or such errors) among many large file uploads. Given the transfer speed in your log, it's clearly not an extreme latency issue or similar either. Was this just a one-off issue at the time, or does it happen regularly and (somewhat) reproducibly?

JustAnotherArchivist avatar Feb 12 '22 06:02 JustAnotherArchivist

I've not used IA since (one time upload), but at the time it was reproducible ; since it was a big upload I don't think I've tried too much to avoid impacting the infra.

u1735067 avatar Feb 13 '22 20:02 u1735067

Perhaps you were hitting a badly overloaded S3 node or item server or something like that.

JustAnotherArchivist avatar Feb 16 '22 22:02 JustAnotherArchivist

I'm frequently encountering this one. The problem is that _build_request is not catching ReadTimeout although it contains the logic regarding retries. As such _build_request retries in case of 5xx errors (which, btw, aren't likely to vary on subsequent calls) but doesn't in case of a ReadTimeout (which really desserve a retry)

Ex: item.py, l.1077

                    try:
                        response = self.session.send(prepared_request,
                                                     stream=True,
                                                     **request_kwargs)
                    except ReadTimeout as error_msg:
                        if verbose:
                            print(f' warning: {error_msg}', file=sys.stderr)
                        sleep(retries_sleep)
                        retries -= 1
                        continue

drzraf avatar Sep 25 '23 20:09 drzraf