wayback-machine-downloader
wayback-machine-downloader copied to clipboard
Error 503
When running I get an error 503 during snapshot phase. in the open-uri.rb I have seen in previous forums this was an issue and wait. I have been trying for 3 days. Thnx.
Here is the info:
Getting snapshot pagesC:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 503 Service Temporarily Unavailable (OpenURI::HTTPError) from C:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:760:in
buffer_open'
from C:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from C:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:212:in
catch'
from C:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from C:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:153:in
open_uri'
from C:/Ruby32-x64/lib/ruby/3.2.0/open-uri.rb:740:in open' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in
get_raw_list_from_api'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:88:in get_all_snapshots_to_consider' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:131:in
get_file_list_all_timestamps'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:158:in get_file_list_by_timestamp' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in
file_list_by_timestamp'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in
<top (required)>'
from C:/Ruby32-x64/bin/wayback_machine_downloader:32:in load' from C:/Ruby32-x64/bin/wayback_machine_downloader:32:in
I am getting the same now
archive.org recently implemented a rate-limiting feature that blocks connections for clients that try to make too many requests in a short timeframe. Try the fixes implemented in #268 or #266 and see if that works for you .
Thanks, I was facing a similar issue, but the patch in #268 allowed me to download the whole website without problems. I suggest the patch should be included in the code asap.
@sww1235 - thanks for the fix. Looks promising! Would love to test it, but don't know how to install (on Ubuntu 22:04).
This won't do the trick:
gem 'wayback_machine_downloader', git: 'git://github.com/sww1235/wayback-machine-downloader.git, branch: 'configurable_delay'
Any advice?
@Tiptop4792 Download the source code of the branch you need (https://github.com/sww1235/wayback-machine-downloader/tree/configurable_delay). Then, from the source directory run
gem build wayback_machine_downloader.gemspec
gem install [whatever the name of the resulting file].gem
Thanks so much, @ingvarr777!
I had to use gem build
, instead of build
. Maybe you just missed it. Anyhow, really cool. Thanks!
Do you know a way how to download the source code via git. I had to download the source directory as a zip file, since
git clone https://github.com/sww1235/wayback-machine-downloader/tree/configurable_delay
didn't work.
Also, @sww1235, your fix works so far. Really cool! - Minor issue, the -n option didn't default to 4 seconds as stated in the manual, but ran into an error when run with '-n' only:
wayback_machine_downloader "example.com" -s -n /var/lib/gems/3.0.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:68:in
<top (required)>': missing argument: -n (OptionParser::MissingArgument) from /usr/local/bin/wayback_machine_downloader:25:in
load' from /usr/local/bin/wayback_machine_downloader:25:in `'
@Tiptop4792 You're welcome
Thanks for the correction, edited my previous post in case someone else needs it.
To answer your question:
git clone -b configurable_delay https://github.com/sww1235/wayback-machine-downloader.git
And 4 sec is default if you don't mention -n. Like this:
wayback_machine_downloader "example.com" -s
You need to use -n
Glad my fix worked for you. Now if we could get it merged and released...
archive.org recently implemented a rate-limiting feature that blocks connections for clients that try to make too many requests in a short timeframe. Try the fixes implemented in #268 or #266 and see if that works for you .
Thank you ! I've already started looking for other programs. I would also like to see a proxy in your program and an automatic proxy change for different time intervals.
If enter such list of command: wayback_machine_downloader site1.com/file wayback_machine_downloader site2.com/file wayback_machine_downloader site3.com/file then wayback_machine_downloader have 503 error.