bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

[FEATURE] Better connection timed out handling

Open gageirwin opened this issue 1 year ago • 3 comments

  • [X] I am requesting a feature.
  • [X] I am running the latest version of BDfR
  • [X] I have read the Opening an issue

Description

When downloading many submissions sometimes a Connection timed out will occur. In this instance it was when trying to download a i.redd.it image (image does exist). If you used a requests session instead of a straight requests call then you could use the sessions built in retry/back off functionality.

Traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 389, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 340, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/bdfr/__main__.py", line 160, in <module>
    cli()
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bdfr/__main__.py", line 89, in cli_download
    reddit_downloader.download()
  File "/usr/local/lib/python3.10/site-packages/bdfr/downloader.py", line 45, in download
    self._download_submission(submission)
  File "/usr/local/lib/python3.10/site-packages/bdfr/downloader.py", line 104, in _download_submission
    res.download({'max_wait_time': self.args.max_wait_time})
  File "/usr/local/lib/python3.10/site-packages/bdfr/resource.py", line 40, in download
    content = self.download_function(download_parameters)
  File "/usr/local/lib/python3.10/site-packages/bdfr/resource.py", line 33, in <lambda>
    return lambda global_params: Resource.http_download(url, global_params)
  File "/usr/local/lib/python3.10/site-packages/bdfr/resource.py", line 70, in http_download
    response = requests.get(url, headers=headers)
  File "/usr/local/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 578, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=None)

gageirwin avatar Oct 17 '22 05:10 gageirwin

Please read the bug and feature request forms. Where is this traceback from? What are the associated logs? Did this crash the BDFR or did it handle it and move on?

Serene-Arc avatar Oct 17 '22 11:10 Serene-Arc

Where is this traceback from?

[2022-10-17 06:55:13,322 - bdfr.downloader - INFO] - Downloaded submission y5oj0l from battlestations
[2022-10-17 06:55:13,322 - bdfr.downloader - DEBUG] - Attempting to download submission y5oj0l
[2022-10-17 06:55:13,322 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/agjmrxm8r7u91.jpg
[2022-10-17 07:11:15,278 - root - ERROR] - Downloader exited unexpectedly

What are the associated logs?

log.txt

Did this crash the BDFR or did it handle it and move on?

This is my bad yes BDFR fully crashed when this occurred.

I have only noticed it happening with reddit hosted images so far. It's not consistent to reproduce as it doesn't seem to be a rate limit issue like I assumed.

gageirwin avatar Oct 17 '22 17:10 gageirwin

I recognise I am bringing an old post to life, but I didn't see anything happen on this since 2022. I am also facing the same issue, BDFR times out and then crashes on some i.redd.it images.

I am using BDFR 2.7.0

Output from the most recent error, happy to provide more information/evidence as required:

[2024-04-03 20:18:18,951 - bdfr.downloader - DEBUG] - Attempting to download submission 1bruwpw [2024-04-03 20:18:18,951 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/ilxpqaz2xjrc1.gif [2024-04-03 20:18:28,996 - root - ERROR] - Downloader exited unexpectedly - BDFR Downloader v2.7.0 Traceback (most recent call last):

File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 466, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 461, in _make_request httplib_response = conn.getresponse() File "C:\Program Files\Python310\lib\http\client.py", line 1375, in getresponse response.begin() File "C:\Program Files\Python310\lib\http\client.py", line 318, in begin version, status, reason = self._read_status() File "C:\Program Files\Python310\lib\http\client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "C:\Program Files\Python310\lib\socket.py", line 705, in readinto return self._sock.recv_into(b) File "C:\Program Files\Python310\lib\ssl.py", line 1274, in recv_into return self.read(nbytes, buffer) File "C:\Program Files\Python310\lib\ssl.py", line 1130, in read return self._sslobj.read(len, buffer) TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Program Files\Python310\lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 798, in urlopen retries = retries.increment( File "C:\Program Files\Python310\lib\site-packages\urllib3\util\retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "C:\Program Files\Python310\lib\site-packages\urllib3\packages\six.py", line 770, in reraise raise value File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 714, in urlopen httplib_response = self._make_request( File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 468, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "C:\Program Files\Python310\lib\site-packages\urllib3\connectionpool.py", line 357, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Program Files\Python310\Scripts\bdfr.exe_main.py", line 7, in File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 783, in invoke return _callback(*args, **kwargs) File "C:\Program Files\Python310\lib\site-packages\click\decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "C:\Program Files\Python310\lib\site-packages\bdfr_main.py", line 123, in cli_download reddit_downloader.download() File "C:\Program Files\Python310\lib\site-packages\bdfr\downloader.py", line 49, in download self._download_submission(submission) File "C:\Program Files\Python310\lib\site-packages\bdfr\downloader.py", line 118, in _download_submission res.download({"max_wait_time": self.args.max_wait_time}) File "C:\Program Files\Python310\lib\site-packages\bdfr\resource.py", line 46, in download content = self.download_function(download_parameters) File "C:\Program Files\Python310\lib\site-packages\bdfr\resource.py", line 39, in return lambda global_params: Resource.http_download(url, global_params) File "C:\Program Files\Python310\lib\site-packages\bdfr\resource.py", line 76, in http_download response = requests.get(url, headers=headers, timeout=10) File "C:\Program Files\Python310\lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, **kwargs) File "C:\Program Files\Python310\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "C:\Program Files\Python310\lib\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "C:\Program Files\Python310\lib\site-packages\requests\sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "C:\Program Files\Python310\lib\site-packages\requests\adapters.py", line 578, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='i.redd.it', port=443): Read timed out. (read timeout=10)

gribzy-uk avatar Apr 03 '24 19:04 gribzy-uk