wayback-machine-archiver icon indicating copy to clipboard operation
wayback-machine-archiver copied to clipboard

TooManyRedirects: Exceeded 30 redirects

Open Melonadev opened this issue 3 years ago • 21 comments

I followed your suggestion on the Server Error issue:

Anyway, the most common reason for a 500 error is the Internet Archive rate-limiting you. My suggestion is to turn the --rate-limit-wait parameter higher! It defaults to 5 seconds; I'd try 30 or even 60.

I tried 30 and this happens:

Microsoft Windows [Version 10.0.18363.1082]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\yewhe\Downloads>archiver --file fest.txt --rate-limit-wait 30
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 35, in call_archiver
    r = session.head(request_url, allow_redirects=True)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 553, in head
    return self.request('HEAD', url, **kwargs)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 518, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in <listcomp>
    history = [resp for resp in gen] if allow_redirects else []
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 137, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 243, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

Melonadev avatar Oct 15 '20 07:10 Melonadev

Here's the txt file with urls I attempted to archive: fest.txt

Melonadev avatar Oct 15 '20 07:10 Melonadev

Interesting error! I've never seen this one before.

I would guess it's a single URL from your list that has some weird script redirection, or something else like that. I'll rerun with logging and see what happens.

agude avatar Oct 17 '20 17:10 agude

I haven't been able to reproduce this. Can you upgrade to the newest version of archiver (1.9.0) and run:

archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG > out.log 2>&1

That's what it would be on Linux, not sure on Windows. The > out.log 2>&1 is just saving the debug and error log to a file, but you can leave those out and copy/paste the output here as well.

agude avatar Oct 17 '20 19:10 agude

EDIT: This is still for version 1.8.1. I have since updated archiver to 1.9.0 and tried it again.

It appears to run successfully without errors, but the last entry of the log file says otherwise:

DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020'}
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 243, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020

Melonadev avatar Oct 19 '20 02:10 Melonadev

Can you make sure you're running 1.9.0? I added more logging that prints the version number, and I don't see it in the above log. 1.9.0 should fix the 520 issue (or at least if they don't show up more than 5 times in a row).

Run:

archiver --version

And when you're running archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG make sure it says "Version 1.9.0" at the top of the log.

agude avatar Oct 19 '20 02:10 agude

I updated archiver to 1.9.0 and used archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG. It's been like this for more than 6 hours and it doesn't seem to have finished:

C:\Users\yewhe\Downloads\archiver files>archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG
DEBUG:root:Archiver Version: 1.9.0
DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out'}
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/
Traceback (most recent call last):
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/

image

Melonadev avatar Oct 19 '20 13:10 Melonadev

:-/

I haven't been able to reproduce the "Too Many Redirects" issue, and I haven't got a 520 error since updating the retry logic to cover 520s.

agude avatar Oct 24 '20 22:10 agude

I tried again and this time it seems to be a 520 error. Strange.

Microsoft Windows [Version 10.0.18363.1139]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\yewhe\Downloads\archiver files>archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG
DEBUG:root:Archiver Version: 1.9.0
DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup'}
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
Traceback (most recent call last):
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
Traceback (most recent call last):
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 244, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018

Melonadev avatar Oct 29 '20 06:10 Melonadev

Also, archiving with an xml sitemap also gives the 'Exceeded 30 redirects' error after about 30-40 minutes: I converted the xml file to txt file because Github doesn't support xml: ragina.txt

C:\Users\yewhe\Downloads\archiver files>archiver --sitemaps file://ragina.xml --rate-limit-wait 30
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 35, in call_archiver
    r = session.head(request_url, allow_redirects=True)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 553, in head
    return self.request('HEAD', url, **kwargs)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 518, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in <listcomp>
    history = [resp for resp in gen] if allow_redirects else []
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 137, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 244, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

Melonadev avatar Nov 04 '20 10:11 Melonadev

The XML sitemap vs list of URLs shouldn't make a difference, they're both processed offline (and with good test coverage) to the same format internally and then pass through the same logic. I'll give this list a try.

Do you know which URL caused the redirect error?

agude avatar Nov 04 '20 17:11 agude

For the xml one (ragina.txt in my previous comment), no idea. This time it doesn't say which url, unlike the txt one. EDIT: see my comment right below (nyman.txt)

Melonadev avatar Nov 05 '20 00:11 Melonadev

BUT this time it did display the problematic link for this file (also converted to txt because Github): nyman.txt

Microsoft Windows [Version 10.0.18363.1171] (c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\yewhe\Downloads\archiver files>archiver --sitemaps file://nyman.xml --rate-limit-wait 30
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/http://www.michaelnyman.com/shop/soundtracks
Traceback (most recent call last):
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/http://www.michaelnyman.com/shop/soundtracks
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 35, in call_archiver
    r = session.head(request_url, allow_redirects=True)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 553, in head
    return self.request('HEAD', url, **kwargs)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 518, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in <listcomp>
    history = [resp for resp in gen] if allow_redirects else []
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 137, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 244, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

Melonadev avatar Nov 05 '20 03:11 Melonadev

@Melonadev I've pushed v1.9.1 which increase the redirect limit from 30 to 100.

I don't expect this to fix the issue (if you have 30, I suspect it is actually an infinite redirect), but I don't know what else to try!

agude avatar Nov 22 '20 06:11 agude

So, interesting that you noticed a failure with a higher wait time in #21 ...

Normally these infinite redirects are because the site is using a cookie to know that it has already redirected you and stop. I wonder if, for long wait times, the cookie goes invalid and so the loop doesn't break.

I'll see if I can find some time over the holiday to try that out.

agude avatar Nov 25 '20 00:11 agude

This could be an issue with Wayback Machine itself and not your archiver, but I'm not sure.

Melonadev avatar Nov 25 '20 02:11 Melonadev

Something strange that has been occurring for the past few days: Any of my attempts to save pages directly to the Wayback Machine website using its online form returns 'Job failed.' image

To be clear, it's not the fault of archiver, but a rather troubling issue with Wayback Machine itself. This occurs on Microsoft Edge 87.0.664.47 and Firefox 83.0, both of which are already up-to-date.

Melonadev avatar Nov 25 '20 12:11 Melonadev

I too have been having issues, but through my scheduled runs of my script:

  • https://twitter.com/RaspberryPion/status/1331522966396366849
  • https://twitter.com/RaspberryPion/status/1330084311110987777
  • https://twitter.com/RaspberryPion/status/1328634615423123457

🤷 Let me know if you figure anything out!

agude avatar Nov 25 '20 19:11 agude

I have contacted the Internet Archive team about this issue: [email protected] Hopefully they can provide an explanation or fix it promptly.

Melonadev avatar Nov 26 '20 05:11 Melonadev

The online form's working now! image

Melonadev avatar Dec 03 '20 04:12 Melonadev

Glad to hear it!

agude avatar Dec 03 '20 05:12 agude

I am running in this or similar issue consistently now. Not able to use the api to save urls. It was working, but suddenly running into this issue!

Name: waybackpy
Version: 3.0.6
Summary: Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily.
Home-page: https://akamhy.github.io/waybackpy/
Author: Akash Mahanty
Author-email: [email protected]
License: MIT
Location: /home/nat/.local/lib/python3.8/site-packages
Requires: click, requests, urllib3
Required-by: 
Note: you may need to restart the kernel to use updated packages.
<ipython-input-4-9a60b9f17f1c> in save_url(save_url)
      2     user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
      3     save_api = WaybackMachineSaveAPI(save_url)
----> 4     save_api.save()

~/.local/lib/python3.8/site-packages/waybackpy/save_api.py in save(self)
    208                 self.sleep(tries)
    209 
--> 210             self.get_save_request_headers()
    211             self.saved_archive = self.archive_url_parser()
    212 

~/.local/lib/python3.8/site-packages/waybackpy/save_api.py in get_save_request_headers(self)
     87         )
     88         session.mount("https://", HTTPAdapter(max_retries=retries))
---> 89         self.response = session.get(self.request_url, headers=self.request_headers)
     90         # requests.response.headers is requests.structures.CaseInsensitiveDict
     91         self.headers = self.response.headers

~/.local/lib/python3.8/site-packages/requests/sessions.py in get(self, url, **kwargs)
    540 
    541         kwargs.setdefault('allow_redirects', True)
--> 542         return self.request('GET', url, **kwargs)
    543 
    544     def options(self, url, **kwargs):

~/.local/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    527         }
    528         send_kwargs.update(settings)
--> 529         resp = self.send(prep, **send_kwargs)
    530 
    531         return resp

~/.local/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
    665             # Redirect resolving generator.
    666             gen = self.resolve_redirects(r, request, **kwargs)
--> 667             history = [resp for resp in gen]
    668         else:
    669             history = []

~/.local/lib/python3.8/site-packages/requests/sessions.py in <listcomp>(.0)
    665             # Redirect resolving generator.
    666             gen = self.resolve_redirects(r, request, **kwargs)
--> 667             history = [resp for resp in gen]
    668         else:
    669             history = []

~/.local/lib/python3.8/site-packages/requests/sessions.py in resolve_redirects(self, resp, req, stream, timeout, verify, cert, proxies, yield_requests, **adapter_kwargs)
    164 
    165             if len(resp.history) >= self.max_redirects:
--> 166                 raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
    167 
    168             # Release the connection back into the pool.

TooManyRedirects: Exceeded 30 redirects.

failed_links

Natkeeran avatar Nov 18 '22 14:11 Natkeeran