bulk-downloader-for-reddit
bulk-downloader-for-reddit copied to clipboard
[FEATURE] Better connection timed out handling
- [X] I am requesting a feature.
- [X] I am running the latest version of BDfR
- [X] I have read the Opening an issue
Description
When downloading many submissions sometimes a Connection timed out
will occur. In this instance it was when trying to download a i.redd.it image (image does exist). If you used a requests session instead of a straight requests call then you could use the sessions built in retry/back off functionality.
Traceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "/usr/local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/usr/local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/local/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/usr/local/lib/python3.10/ssl.py", line 1071, in _create
self.do_handshake()
File "/usr/local/lib/python3.10/ssl.py", line 1342, in do_handshake
self._sslobj.do_handshake()
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 389, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 340, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/bdfr/__main__.py", line 160, in <module>
cli()
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/bdfr/__main__.py", line 89, in cli_download
reddit_downloader.download()
File "/usr/local/lib/python3.10/site-packages/bdfr/downloader.py", line 45, in download
self._download_submission(submission)
File "/usr/local/lib/python3.10/site-packages/bdfr/downloader.py", line 104, in _download_submission
res.download({'max_wait_time': self.args.max_wait_time})
File "/usr/local/lib/python3.10/site-packages/bdfr/resource.py", line 40, in download
content = self.download_function(download_parameters)
File "/usr/local/lib/python3.10/site-packages/bdfr/resource.py", line 33, in <lambda>
return lambda global_params: Resource.http_download(url, global_params)
File "/usr/local/lib/python3.10/site-packages/bdfr/resource.py", line 70, in http_download
response = requests.get(url, headers=headers)
File "/usr/local/lib/python3.10/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 578, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=None)
Please read the bug and feature request forms. Where is this traceback from? What are the associated logs? Did this crash the BDFR or did it handle it and move on?
Where is this traceback from?
[2022-10-17 06:55:13,322 - bdfr.downloader - INFO] - Downloaded submission y5oj0l from battlestations
[2022-10-17 06:55:13,322 - bdfr.downloader - DEBUG] - Attempting to download submission y5oj0l
[2022-10-17 06:55:13,322 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/agjmrxm8r7u91.jpg
[2022-10-17 07:11:15,278 - root - ERROR] - Downloader exited unexpectedly
What are the associated logs?
Did this crash the BDFR or did it handle it and move on?
This is my bad yes BDFR fully crashed when this occurred.
I have only noticed it happening with reddit hosted images so far. It's not consistent to reproduce as it doesn't seem to be a rate limit issue like I assumed.
I recognise I am bringing an old post to life, but I didn't see anything happen on this since 2022. I am also facing the same issue, BDFR times out and then crashes on some i.redd.it images.
I am using BDFR 2.7.0
Output from the most recent error, happy to provide more information/evidence as required:
[2024-04-03 20:18:18,951 - bdfr.downloader - DEBUG] - Attempting to download submission 1bruwpw [2024-04-03 20:18:18,951 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/ilxpqaz2xjrc1.gif [2024-04-03 20:18:28,996 - root - ERROR] - Downloader exited unexpectedly - BDFR Downloader v2.7.0 Traceback (most recent call last):
File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\Program Files\Python310\lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 798, in urlopen retries = retries.increment( File "C:\Program Files\Python310\lib\site-packages\urllib3\util\retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "C:\Program Files\Python310\lib\site-packages\urllib3\packages\six.py", line 770, in reraise raise value File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 714, in urlopen httplib_response = self._make_request( File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 468, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 357, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Program Files\Python310\Scripts\bdfr.exe_main.py", line 7, in