wayback-machine-downloader
wayback-machine-downloader copied to clipboard
400 Bad Request?
Hi so I am having a bit of trouble with my restore. I believe I have done the download of both the wayback machine downloader and ruby 2.7.2p137 correctly. The problem I am having is with the last step. I am trying to recover a sight recently archived and am confused on why I get this error. Does anyone have any solutions?
Getting snapshot pagesC:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader/archive_api.rb:8: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
Traceback (most recent call last):
17: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `<main>'
16: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `load'
15: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/bin/wayback_machine_downloader:72:in `<top (required)>'
14: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:192:in `download_files'
13: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
12: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:168:in `get_file_list_by_timestamp'
11: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
10: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
9: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
8: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:19:in `open'
7: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:50:in `open'
6: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:744:in `open'
5: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:174:in `open_uri'
4: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `open_loop'
3: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `catch'
2: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:235:in `block in open_loop'
1: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:764:in `buffer_open'
C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:387:in `open_http': 400 Bad Request (OpenURI::HTTPError)
me too. ubuntu 20.04 ruby 2.7 change open() to URI.open() error msg:
16: from /usr/local/bin/wayback_machine_downloader:23:in `<main>'
15: from /usr/local/bin/wayback_machine_downloader:23:in `load'
14: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/bin/wayback_machine_downloader:72:in `<top (required)>'
13: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:192:in `download_files'
12: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
11: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:168:in `get_file_list_by_timestamp'
10: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
9: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
8: from /var/lib/gems/2.7.0/gems/wayback_machine_downloader-2.2.1/lib/wayback_machine_downloader/archive_api.rb:8:in `get_raw_list_from_api'
7: from /usr/lib/ruby/2.7.0/open-uri.rb:50:in `open'
6: from /usr/lib/ruby/2.7.0/open-uri.rb:744:in `open'
5: from /usr/lib/ruby/2.7.0/open-uri.rb:174:in `open_uri'
4: from /usr/lib/ruby/2.7.0/open-uri.rb:233:in `open_loop'
3: from /usr/lib/ruby/2.7.0/open-uri.rb:233:in `catch'
2: from /usr/lib/ruby/2.7.0/open-uri.rb:235:in `block in open_loop'
1: from /usr/lib/ruby/2.7.0/open-uri.rb:764:in `buffer_open'
/usr/lib/ruby/2.7.0/open-uri.rb:387:in `open_http': 400 Bad Request (OpenURI::HTTPError)
Please retry with the latest version 2.3.0, it might work better.
Same problem on my Mac and my Windows 10 VM:
OS: macOS HighSierra 10.13.6 wayback_machine_downloader: 2.3.0 Ruby: ruby 2.3.7p456 (2018-03-28 revision 63024) [universal.x86_64-darwin17]
Getting snapshot pages/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:359:in `open_http': 400 Bad Request (OpenURI::HTTPError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:737:in `buffer_open'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:212:in `block in open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:210:in `catch'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:210:in `open_loop'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:151:in `open_uri'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:717:in `open'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader/archive_api.rb:13:in `get_raw_list_from_api'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:164:in `get_file_list_by_timestamp'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:192:in `download_files'
from /Library/Ruby/Gems/2.3.0/gems/wayback_machine_downloader-2.3.0/bin/wayback_machine_downloader:72:in `<top (required)>'
from /usr/local/bin/wayback_machine_downloader:22:in `load'
from /usr/local/bin/wayback_machine_downloader:22:in `<main>'
Os: Windows 10 wayback_machine_downloader: 2.3.0 Ruby: wayback_machine_downloader: 2.3.0
Getting snapshot pagesTraceback (most recent call last):
15: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `<main>'
14: from C:/Ruby27-x64/bin/wayback_machine_downloader:23:in `load'
13: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/bin/wayback_machine_downloader:72:in `<top (required)>'
12: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:192:in `download_files'
11: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:309:in `file_list_by_timestamp'
10: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:164:in `get_file_list_by_timestamp'
9: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:105:in `get_file_list_curated'
8: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader.rb:88:in `get_all_snapshots_to_consider'
7: from C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/wayback_machine_downloader-2.3.0/lib/wayback_machine_downloader/archive_api.rb:13:in `get_raw_list_from_api'
6: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:744:in `open'
5: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:174:in `open_uri'
4: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `open_loop'
3: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:233:in `catch'
2: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:235:in `block in open_loop'
1: from C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:764:in `buffer_open'
C:/Ruby27-x64/lib/ruby/2.7.0/open-uri.rb:387:in `open_http': 400 Bad Request (OpenURI::HTTPError)
Any idea what might help?
Update:
I have tried the docker container as well and ran into the same. This might be related to the web site I try to download: http:/pondini.org
That site works fine for me on Linux.
I have created a branch that adds some debugging to the API code. It will still crash, but if you could copy the debugging output before the Traceback, then that would help figure out where the issue is and if this is a bug in the wayback-machine-downloader code or in the API itself. Please try it out on your system:
https://github.com/pabs3/wayback-machine-downloader/tree/debug-crashes
Hi,
sorry, but the problem is gone now. I tried the debug version and it worked without problems. So I returned to the original version and it worked as well.
As far as I do understand this behavior something must have been changed at Wayback.
Thanks for testing, I think you could be right. I think that the wayback-machine-downloader should do better than crashing when this happens though. I'll try to track down some IA folks to ask about it.
-- bye, pabs
https://bonedaddy.net/pabs3/
Did anyone record any of the IA URLs that gave a 400 error?