internetarchive
internetarchive copied to clipboard
"Read timed out" when uploading large files; allow to set upload timeout ?
Hello,
I used this client (with Python 3.6 installed) to upload large files to archive.org (thank you for the client) but when uploading large files I encountered this error:
# /root/ia upload --no-derive --checksum --metadata="mediatype:software" identifier bigfile.rar
identifier:
uploading bigfile.rar: 0%| | 0/6605 [00:00<?, ?MiB/s]
uploading bigfile.rar: 100%|| 6605/6605 [05:52<00:00, 18.73MiB/s]
Traceback (most recent call last):
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/lib64/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python3.6/http/client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib64/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/lib64/python3.6/ssl.py", line 968, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib64/python3.6/ssl.py", line 830, in read
return self._sslobj.read(len, buffer)
File "/usr/lib64/python3.6/ssl.py", line 587, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.pex/install/requests-2.24.0-py2.py3-none-any.whl.9ec91e5091d964e80194e552b3c3e9388dd1f722/requests-2.24.0-py2.py3-none-any.whl/requests/adapters.py", line 449, in send
timeout=timeout
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/util/retry.py", line 410, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/packages/six.py", line 735, in reraise
raise value
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 428, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/root/.pex/install/urllib3-1.25.11-py2.py3-none-any.whl.abb1c418ef262db791ed65e368974747d34f2c81/urllib3-1.25.11-py2.py3-none-any.whl/urllib3/connectionpool.py", line 336, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='s3.us.archive.org', port=443): Read timed out. (read timeout=120)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/ia/.bootstrap/_pex/pex.py", line 367, in execute
File "/root/ia/.bootstrap/_pex/pex.py", line 293, in _wrap_coverage
File "/root/ia/.bootstrap/_pex/pex.py", line 325, in _wrap_profiling
File "/root/ia/.bootstrap/_pex/pex.py", line 410, in _execute
File "/root/ia/.bootstrap/_pex/pex.py", line 468, in execute_entry
File "/root/ia/.bootstrap/_pex/pex.py", line 486, in execute_pkg_resources
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/cli/ia.py", line 170, in main
sys.exit(ia_module.main(argv, session))
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/cli/ia_upload.py", line 245, in main
for _r in _upload_files(item, files, upload_kwargs):
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/cli/ia_upload.py", line 97, in _upload_files
response = item.upload(files, **upload_kwargs)
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/item.py", line 1254, in upload
request_kwargs=request_kwargs)
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/item.py", line 1060, in upload_file
**request_kwargs)
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/session.py", line 541, in send
raise e
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/session.py", line 538, in send
reraise_modify(e, e.request.url, prepend=False)
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/utils.py", line 312, in reraise_modify
traceback)
File "/root/.pex/install/six-1.15.0-py2.py3-none-any.whl.8180e2276d844cb46f09eb1f9fd7d96ea769f9b5/six-1.15.0-py2.py3-none-any.whl/six.py", line 703, in reraise
raise value
File "/root/.pex/install/internetarchive-1.9.6-py2.py3-none-any.whl.d73c5b7637449a4323ae15fa0c908640040b11e0/internetarchive-1.9.6-py2.py3-none-any.whl/internetarchive/session.py", line 535, in send
r = super(ArchiveSession, self).send(request, **kwargs)
File "/root/.pex/install/requests-2.24.0-py2.py3-none-any.whl.9ec91e5091d964e80194e552b3c3e9388dd1f722/requests-2.24.0-py2.py3-none-any.whl/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/root/.pex/install/requests-2.24.0-py2.py3-none-any.whl.9ec91e5091d964e80194e552b3c3e9388dd1f722/requests-2.24.0-py2.py3-none-any.whl/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='s3.us.archive.org', port=443): Read timed out. (read timeout=120)",), 'https://s3.us.archive.org/identifier/bigfile.rar')
I was able to workaround it by using the Pypi version in venv (instead of the binary) and modifying https://github.com/jjjake/internetarchive/blob/v1.9.6/internetarchive/item.py#L938 to replace 120 by 600, but I see most of the code is already present (https://github.com/jjjake/internetarchive/blob/v1.9.6/internetarchive/cli/ia_upload.py#L216) to modify this on demand, so maybe this could be exposed as an option?
Best, Alexandre
This is weird. That timeout parameter is actually for exceeding the time between receiving data on the raw socket, and it only applies after the upload is completed and the client is trying to read a response from the HTTP connection. This would mean that IA didn't acknowledge your upload for two minutes after the transfer completed. While this can take a few seconds, I've never seen hangs anywhere near that long (or such errors) among many large file uploads. Given the transfer speed in your log, it's clearly not an extreme latency issue or similar either. Was this just a one-off issue at the time, or does it happen regularly and (somewhat) reproducibly?
I've not used IA since (one time upload), but at the time it was reproducible ; since it was a big upload I don't think I've tried too much to avoid impacting the infra.
Perhaps you were hitting a badly overloaded S3 node or item server or something like that.
I'm frequently encountering this one.
The problem is that _build_request
is not catching ReadTimeout
although it contains the logic regarding retries.
As such _build_request
retries in case of 5xx errors (which, btw, aren't likely to vary on subsequent calls) but doesn't in case of a ReadTimeout (which really desserve a retry)
Ex: item.py
, l.1077
try:
response = self.session.send(prepared_request,
stream=True,
**request_kwargs)
except ReadTimeout as error_msg:
if verbose:
print(f' warning: {error_msg}', file=sys.stderr)
sleep(retries_sleep)
retries -= 1
continue