wayback-machine-downloader icon indicating copy to clipboard operation
wayback-machine-downloader copied to clipboard

Failed to dl my old site :/

Open jamieduk opened this issue 1 year ago • 13 comments

jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader jnet.sytes.net Downloading jnet.sytes.net to websites/jnet.sytes.net/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open' from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri' from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>' from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

' jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader http://jnet.sytes.net Downloading http://jnet.sytes.net to websites/jnet.sytes.net/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open' from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri' from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>' from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

' jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -d . http://jnet.sytes.net Downloading http://jnet.sytes.net to ./ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open' from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri' from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>' from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

' jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -a -d /home/jay/Downloads/jnet_site/archive_dl http://jnet.sytes.net Downloading http://jnet.sytes.net to /home/jay/Downloads/jnet_site/archive_dl/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open' from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri' from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>' from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

' jay@jnetreloaded:~/Downloads/jnet_site/archive_dl$ sudo wayback_machine_downloader -a -d ./websites http://jnet.sytes.net Downloading http://jnet.sytes.net to ./websites/ from Wayback Machine archives.

Getting snapshot pages../usr/lib/ruby/3.2.0/open-uri.rb:369:in open_http': 400 BAD REQUEST (OpenURI::HTTPError) from /usr/lib/ruby/3.2.0/open-uri.rb:760:in buffer_open' from /usr/lib/ruby/3.2.0/open-uri.rb:214:in block in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in catch' from /usr/lib/ruby/3.2.0/open-uri.rb:212:in open_loop' from /usr/lib/ruby/3.2.0/open-uri.rb:153:in open_uri' from /usr/lib/ruby/3.2.0/open-uri.rb:740:in open' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader/archive_api.rb:13:in get_raw_list_from_api' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:92:in block in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in times' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:91:in get_all_snapshots_to_consider' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:105:in get_file_list_curated' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:164:in get_file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:309:in file_list_by_timestamp' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb:192:in download_files' from /var/lib/gems/3.2.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:72:in <top (required)>' from /usr/local/bin/wayback_machine_downloader:25:in load' from /usr/local/bin/wayback_machine_downloader:25:in

'

jamieduk avatar Nov 27 '24 13:11 jamieduk

@afongemie you mean wayback_machine_downloader jnet.sytes.net ? Have you actually tried it. It raises the same error.

dmikhaylov avatar Nov 29 '24 19:11 dmikhaylov

Same here :-(

It seems that the structure of the wayback machine archive service changed a bit...

In wayback_machine_downloader.rb (in /Users/user/.gem/ruby/2.6.0/gems/wayback_machine_downloader-2.3.1/lib/wayback_machine_downloader.rb if you installed here), you can replace the function get_all_snapshots_to_consider in the code by this :

  def get_all_snapshots_to_consider
    # Note: Passing a page index parameter allow us to get more snapshots,
    # but from a less fresh index
    print "Getting snapshot pages"
    snapshot_list_to_consider = []
    snapshot_list_to_consider += get_raw_list_from_api(@base_url, nil)
    print "."
    unless @exact_url
#      @maximum_pages.times do |page_index|
#        snapshot_list = get_raw_list_from_api(@base_url + '/*', page_index)
#        break if snapshot_list.empty?
#        snapshot_list_to_consider += snapshot_list
#        print "."
#      end
        page_index = 0
        snapshot_list = get_raw_list_from_api(@base_url + '/*', page_index)
        snapshot_list_to_consider += snapshot_list
        print "."
    end
    puts " found #{snapshot_list_to_consider.length} snaphots to consider."
    puts
    snapshot_list_to_consider
  end

It download everything BUT THE LINKS ARE NOT PRESERVED !

fredericschmidt avatar Dec 15 '24 18:12 fredericschmidt

https://github.com/StrawberryMaster/wayback-machine-downloader works for me.

acenturyandabit avatar Dec 21 '24 07:12 acenturyandabit

https://github.com/StrawberryMaster/wayback-machine-downloader works for me.

How can I get this working in Windows, please?

I have installed Ruby but have no idea where to go from here.

StrawberryMaster did not include much documentation, unfortunately.

kingmustard avatar Jan 23 '25 15:01 kingmustard

How can I get this working in Windows, please?

I have installed Ruby but have no idea where to go from here.

StrawberryMaster did not include much documentation, unfortunately.

Oops, sorry about that @kingmustard. First, make sure Ruby is indeed installed — run ruby -v and see if the version for it displays. If so, Ruby is working.

Assuming you downloaded the default Ruby installation, you probably have Bundler included, so you can type bundle install to download the dependencies, and press enter. (If it doesn't work, run gem install bundler and then follow these steps again.) When that's done, you need to navigate to the folder you extracted WMD's contents to. For example, if you extracted it under a "WMD" folder within your Downloads directory, you'd need to open your terminal (Windows Terminal defaults to Command Prompt, but PowerShell works too) and type cd Downloads\WMD\bin OR you could just go to the WMD\bin folder directly, Shift + Right Click anywhere inside it, and click the "Open using Windows Terminal/Powershell" button.

If that worked, you can do ruby wayback_machine_downloader http://example.com (or whatever site you want) and it should work. I've also updated the original documentation so others don't get lost in the future.

StrawberryMaster avatar Jan 23 '25 17:01 StrawberryMaster

How can I get this working in Windows, please? I have installed Ruby but have no idea where to go from here. StrawberryMaster did not include much documentation, unfortunately.

Oops, sorry about that @kingmustard. First, make sure Ruby is indeed installed — run ruby -v and see if the version for it displays. If so, Ruby is working.

Assuming you downloaded the default Ruby installation, you probably have Bundler included, so you can type bundle install to download the dependencies, and press enter. (If it doesn't work, run gem install bundler and then follow these steps again.) When that's done, you need to navigate to the folder you extracted WMD's contents to. For example, if you extracted it under a "WMD" folder within your Downloads directory, you'd need to open your terminal (Windows Terminal defaults to Command Prompt, but PowerShell works too) and type cd Downloads\WMD\bin OR you could just go to the WMD\bin folder directly, Shift + Right Click anywhere inside it, and click the "Open using Windows Terminal/Powershell" button.

If that worked, you can do ruby wayback_machine_downloader http://example.com (or whatever site you want) and it should work. I've also updated the original documentation so others don't get lost in the future.

I appreciate your help.

'bundle install' did not work but 'gem install bundler' did.

I cannot find anywhere where to download WMD on https://github.com/StrawberryMaster/wayback-machine-downloader. There is nothing in the 'Releases' section.

kingmustard avatar Jan 24 '25 00:01 kingmustard

@kingmustard I should probably fix that - there should be something in the Releases section now. If that doesn't work, you can always can click on the Code button and switch to the Local tab. There should be a "Download Zip" button there.

(If something isn't working, feel free to run bundle install now that you got bundler installed, and then run WMD.)

StrawberryMaster avatar Jan 24 '25 01:01 StrawberryMaster

@kingmustard I should probably fix that - there should be something in the Releases section now. If that doesn't work, you can always can click on the Code button and switch to the Local tab. There should be a "Download Zip" button there.

(If something isn't working, feel free to run bundle install now that you got bundler installed, and then run WMD.)

I think we are getting closer 😊 However:

PS C:\Users\Elliot\Desktop\wmd\bin> ruby wayback_machine_downloader http://elliotsworld.co.uk
<internal:C:/Ruby33-x64/lib/ruby/site_ruby/3.3.0/rubygems/core_ext/kernel_require.rb>:136:in `require': cannot load such file -- concurrent-ruby (LoadError)
        from <internal:C:/Ruby33-x64/lib/ruby/site_ruby/3.3.0/rubygems/core_ext/kernel_require.rb>:136:in `require'
        from C:/Users/Elliot/Desktop/wmd/lib/wayback_machine_downloader.rb:10:in `<top (required)>'
        from wayback_machine_downloader:3:in `require_relative'
        from wayback_machine_downloader:3:in `<main>'
PS C:\Users\Elliot\Desktop\wmd\bin>

kingmustard avatar Jan 24 '25 08:01 kingmustard

@kingmustard Weird. You can just install concurrent-ruby then, since somehow that's missing, using gem install concurrent-ruby -v 1.3.5 and it should probably work.

StrawberryMaster avatar Jan 24 '25 11:01 StrawberryMaster

@kingmustard Weird. You can just install concurrent-ruby then, since somehow that's missing, using gem install concurrent-ruby -v 1.3.5 and it should probably work.

Hi there,

PS C:\WINDOWS\system32> ruby wayback_machine_downloader http://elliotsworld.co.uk
wayback_machine_downloader: --> wayback_machine_downloader
expected a newline or semicolon after the statementcannot parse the expression
> 1        PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
> 2        957       1     957      87932  cons0     197608 13:40:12 /usr/bin/ps

wayback_machine_downloader:2: syntax error, unexpected integer literal, expecting end-of-input (SyntaxError)
      957       1     957      87932  cons0   ...

kingmustard avatar Jan 28 '25 19:01 kingmustard

@kingmustard My guess is that you're on the wrong folder here - you should be running it from the place you extracted the folder too, and not System32 (which would be a pretty dangerous place to have it!)

If you extracted it to a folder named "WMD" in your Downloads folder, for example, you'd need to do this in PowerShell:

cd \
cd C:\Users\YOURPROFILENAMEHERE\Downloads\WMD

or just go to your WMD folder in file explorer, copy the link to the folder, and just do cd linkyoucopiedhere. From there, you'll need to go to the bin folder (just do cd bin) and run the commands as normal.

StrawberryMaster avatar Jan 29 '25 00:01 StrawberryMaster

Unfortunately, this is too confusing to use without an exe / GUI.

I appreciate the help you have given me and I hope someone makes a fork some time in the future 😊

kingmustard avatar Jan 29 '25 07:01 kingmustard

@kingmustard Fair! I guess I can look into that and see if it makes things easier. I'm not aware of a fork with a GUI, but there is a Python alternative which you may find easier to install.

StrawberryMaster avatar Jan 29 '25 11:01 StrawberryMaster

related: https://github.com/hartator/wayback-machine-downloader/issues/307

cirosantilli avatar May 22 '25 11:05 cirosantilli