wayback-machine-downloader
wayback-machine-downloader copied to clipboard
Implement Net::HTTP to resolve rate limiting
This is all based on https://github.com/hartator/wayback-machine-downloader/issues/267#issuecomment-1868090089 and @ee3e's work.
This resolves all rate limiting issues without the need of any delays/sleeps.
I am not sure that the http.finish() line in get_raw_list_from_api is in the correct place, so any code review would be helpful.
Regardless, I thought I'd submit this to try to resolve several of the issues that have come up lately.
Legitimately all credit should go to @ee3e for their solution. This helped me download a ridiculously large backup without issue (452831 files.)
(Issues) Resolves #277, resolves #275, resolves #273, resolves #269, resolves #267
(Pull requests) Resolves #268, resolves #266, resolves #262 (at least according to comments)
awesome. working fine ! but i'm interested into why the use of Net::HTTP overcomes the rate-limiting. Do you have any idea what the initial problem was?
awesome. working fine ! but i'm interested into why the use of Net::HTTP overcomes the rate-limiting. Do you have any idea what the initial problem was?
Essentially we're using the same persistent HTTP session to download the whole thing (both snapshots and pages) and keeping it open until it's complete rather than opening/closing several sessions, which the Wayback Machine doesn't like (even if you're using a legitimate browser!).
Until this gets merged and released can you provide instructions for a non-ruby person to run this branch?
Until this gets merged and released can you provide instructions for a non-ruby person to run this branch?
There are instructions in #281
Until this gets merged and released can you provide instructions for a non-ruby person to run this branch?
because this project had no updates for the last 3y now i've written a replacement in python for my needs... seems dead
Finished a 3,000,000 snapshot download thanks to this. Much appreciated.