wayback-machine-downloader
wayback-machine-downloader copied to clipboard
Added Configurable Delay option
This helps avoid the rate-limiting introduced by archive.org
Alternative to #266 that allows for a configurable delay rather than a hardcoded delay. it also introduces the same delay when fetching snapshots
Closes #267, maybe #264, #244 and #246
Reference to archive.org implementing rate limiting: https://archive.org/details/toomanyrequests_20191110
I was unable to run rake on this due to an error with rake. I don't think i broke any of the tests but let me know if you need anything fixed.
I picked -n for the delay option short form based on the linux nice command. the other good options were already used. Feel free to change this if desired.
@sww1235 it would be nice to update README.md too with your new config option.
I think it would be nice to include a message on "connection refused" too: Connection was refused. You may be rate limited. Trying increasing the rate-limit value. See : github.com/foo/bar.
Note: you should add the download delay after the check if the file exists (this line): https://github.com/hartator/wayback-machine-downloader/pull/268/files#diff-012e3d978c45d5eff042c16d88ed89dd9e302c0d3fa43df46a87f82f957fafacL266
Otherwise you will be waiting a long time if you are resuming a partial download.
There should also be a configurable amount of retry attempts per file.
Great fork, has completely solved the issues I was having before. Thank you.
@sww1235 you are referring to the submit rate limit implemented quite some time ago. It would be good to know what the actual download rate limit is to make sure the default 4 second is actually a sane default delay.
With https://github.com/hartator/wayback-machine-downloader/issues/267#issuecomment-1868090089 it might be that this work-around is no longer essential, but it would be good to have either way.
As for the parameter naming, what about --download-interval (or --interval if you insist on a short name) and -i, to have more self-explanatory names?
Hello, I'm not a coder just a regular guy that ran into this problem. Is there anywhere I can read or watch on how to fix it? I see your fixes but how do I do it? Thanks
The delay actually not the delay in between downloads/requests, but in between files processing,
For example I'm resuming a process which have a lot of already exists. files and the delay is applied even though the request to the service is not performed for such files.