wayback-machine-downloader icon indicating copy to clipboard operation
wayback-machine-downloader copied to clipboard

Implement Net::HTTP to resolve rate limiting

Open ShiftaDeband opened this issue 1 year ago • 9 comments

This is all based on https://github.com/hartator/wayback-machine-downloader/issues/267#issuecomment-1868090089 and @ee3e's work.

This resolves all rate limiting issues without the need of any delays/sleeps.

I am not sure that the http.finish() line in get_raw_list_from_api is in the correct place, so any code review would be helpful.

Regardless, I thought I'd submit this to try to resolve several of the issues that have come up lately.

Legitimately all credit should go to @ee3e for their solution. This helped me download a ridiculously large backup without issue (452831 files.)

(Issues) Resolves #277, resolves #275, resolves #273, resolves #269, resolves #267

(Pull requests) Resolves #268, resolves #266, resolves #262 (at least according to comments)

ShiftaDeband avatar Feb 08 '24 05:02 ShiftaDeband

awesome. working fine ! but i'm interested into why the use of Net::HTTP overcomes the rate-limiting. Do you have any idea what the initial problem was?

bitdruid avatar Feb 14 '24 12:02 bitdruid

awesome. working fine ! but i'm interested into why the use of Net::HTTP overcomes the rate-limiting. Do you have any idea what the initial problem was?

Essentially we're using the same persistent HTTP session to download the whole thing (both snapshots and pages) and keeping it open until it's complete rather than opening/closing several sessions, which the Wayback Machine doesn't like (even if you're using a legitimate browser!).

ShiftaDeband avatar Mar 01 '24 22:03 ShiftaDeband

Until this gets merged and released can you provide instructions for a non-ruby person to run this branch?

greggles avatar May 18 '24 14:05 greggles

Until this gets merged and released can you provide instructions for a non-ruby person to run this branch?

There are instructions in #281

greggles avatar May 18 '24 14:05 greggles

Until this gets merged and released can you provide instructions for a non-ruby person to run this branch?

because this project had no updates for the last 3y now i've written a replacement in python for my needs... seems dead

bitdruid avatar May 18 '24 15:05 bitdruid

Finished a 3,000,000 snapshot download thanks to this. Much appreciated.

tlorien avatar Aug 23 '24 14:08 tlorien