wayback-machine-downloader icon indicating copy to clipboard operation
wayback-machine-downloader copied to clipboard

Added Configurable Delay option

Open sww1235 opened this issue 2 years ago • 9 comments

This helps avoid the rate-limiting introduced by archive.org

Alternative to #266 that allows for a configurable delay rather than a hardcoded delay. it also introduces the same delay when fetching snapshots

Closes #267, maybe #264, #244 and #246

Reference to archive.org implementing rate limiting: https://archive.org/details/toomanyrequests_20191110

sww1235 avatar Nov 16 '23 04:11 sww1235

I was unable to run rake on this due to an error with rake. I don't think i broke any of the tests but let me know if you need anything fixed.

sww1235 avatar Nov 16 '23 04:11 sww1235

I picked -n for the delay option short form based on the linux nice command. the other good options were already used. Feel free to change this if desired.

sww1235 avatar Nov 16 '23 04:11 sww1235

@sww1235 it would be nice to update README.md too with your new config option.

lcorbasson avatar Nov 25 '23 15:11 lcorbasson

I think it would be nice to include a message on "connection refused" too: Connection was refused. You may be rate limited. Trying increasing the rate-limit value. See : github.com/foo/bar.

MatthewTingum avatar Nov 28 '23 05:11 MatthewTingum

Note: you should add the download delay after the check if the file exists (this line): https://github.com/hartator/wayback-machine-downloader/pull/268/files#diff-012e3d978c45d5eff042c16d88ed89dd9e302c0d3fa43df46a87f82f957fafacL266

Otherwise you will be waiting a long time if you are resuming a partial download.

There should also be a configurable amount of retry attempts per file.

Theta-Dev avatar Dec 02 '23 18:12 Theta-Dev

Great fork, has completely solved the issues I was having before. Thank you.

JomSpoons avatar Dec 18 '23 18:12 JomSpoons

@sww1235 you are referring to the submit rate limit implemented quite some time ago. It would be good to know what the actual download rate limit is to make sure the default 4 second is actually a sane default delay.

With https://github.com/hartator/wayback-machine-downloader/issues/267#issuecomment-1868090089 it might be that this work-around is no longer essential, but it would be good to have either way.

As for the parameter naming, what about --download-interval (or --interval if you insist on a short name) and -i, to have more self-explanatory names?

Forage avatar Jan 10 '24 13:01 Forage

Hello, I'm not a coder just a regular guy that ran into this problem. Is there anywhere I can read or watch on how to fix it? I see your fixes but how do I do it? Thanks

MWigginsIII avatar Jan 14 '24 01:01 MWigginsIII

The delay actually not the delay in between downloads/requests, but in between files processing, For example I'm resuming a process which have a lot of already exists. files and the delay is applied even though the request to the service is not performed for such files.

hlorofos avatar Jan 23 '24 10:01 hlorofos