FanFicFare icon indicating copy to clipboard operation
FanFicFare copied to clipboard

Downloading from Web Archive through retro-proxy

Open mcepl opened this issue 1 year ago • 0 comments

I am trying to download http://archiveofourown.org/works/8741551 which is perfectly available on https://web.archive.org/web/20170214224728/archiveofourown.org/works/8741551 but not available on the real AO3. When trying a proxy from either https://github.com/remino/timeprox or from https://github.com/richardg867/WaybackProxy, both of them end with this error (this is timeprox with line 42 of server.js modified to download from https://web.archive.org/web/20170214224728/${url}):

stitny~/K/f/tmp$ fanficfare -d -o http_proxy=http://127.0.0.1:3000 -o https_proxy=http://127.0.0.1:3000 http://archiveofourown.org/works/8741551
FFF: DEBUG: 2022-08-13 18:20:14,224: cli.py(230):     OS Version:Linux-5.18.15-1-default-x86_64-with-glibc2.35
FFF: DEBUG: 2022-08-13 18:20:14,224: cli.py(231): Python Version:3.10.6 (main, Aug 02 2022, 17:22:31) [GCC]
FFF: DEBUG: 2022-08-13 18:20:14,224: cli.py(232):    FFF Version:4.14.3
FFF: DEBUG: 2022-08-13 18:20:14,239: configurable.py(1044): use_browser_cache:
FFF: DEBUG: 2022-08-13 18:20:14,239: configurable.py(1058): use_basic_cache:true
FFF: INFO: 2022-08-13 18:20:14,246: adapter_archiveofourownorg.py(163): url: https://archiveofourown.org/works/8741551/navigate?view_adult=true
FFF: INFO: 2022-08-13 18:20:14,246: adapter_archiveofourownorg.py(164): metaurl: https://archiveofourown.org/works/8741551?view_adult=true
FFF: DEBUG: 2022-08-13 18:20:14,247: fetcher.py(234): 
========== MISS (GET) BasicCache
https://archiveofourown.org/works/8741551/navigate?view_adult=true
FFF: DEBUG: 2022-08-13 18:20:14,247: fetcher.py(469): 
---------- REQ (GET) RequestsFetcher
https://archiveofourown.org/works/8741551/navigate?view_adult=true
FFF: DEBUG: 2022-08-13 18:20:14,248: fetcher.py(450): Session Proxies After INI:{'http': 'http://127.0.0.1:3000', 'https': 'http://127.0.0.1:3000'}
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/urllib3/connectionpool.py", line 700, in urlopen
    self._prepare_proxy(conn)
  File "/usr/lib/python3.10/site-packages/urllib3/connectionpool.py", line 996, in _prepare_proxy
    conn.connect()
  File "/usr/lib/python3.10/site-packages/urllib3/connection.py", line 369, in connect
    self._tunnel()
  File "/usr/lib64/python3.10/http/client.py", line 920, in _tunnel
    (version, code, message) = response._read_status()
  File "/usr/lib64/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/usr/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='archiveofourown.org', port=443): Max retries exceeded with url: /works/8741551/navigate?view_adult=true (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response')))

During handling of the above exception, another exception occurred:
stitny~/K/f/tmp$ 

(using urllib3 1.26.11 from openSUSE package with no patches on the main code)

Do you have any idea what’s going on, please?

mcepl avatar Aug 13 '22 16:08 mcepl