waybackpack icon indicating copy to clipboard operation
waybackpack copied to clipboard

question: able to download a website historically while only saving the 1st successful page?

Open devinschumacher opened this issue 7 months ago • 2 comments

any change to get a feature where we can download a site from a range of dates? for example 2015-Today to try and get every copy of a URL, but only save the most successful download?

the use case is im trying to get a website, but some pages are "blocked by cloudflare" on certain versions of archive.org

thanks!

devinschumacher avatar Nov 26 '23 04:11 devinschumacher

I don't think waybackpack currently supports this, but would be open to a PR that adds it. One tricky bit might be defining a criteria for "successful", particularly if the HTTP status code does not make it clear.

jsvine avatar Nov 27 '23 20:11 jsvine

I don't think waybackpack currently supports this, but would be open to a PR that adds it. One tricky bit might be defining a criteria for "successful", particularly if the HTTP status code does not make it clear.

yeah i was thinking that same thing about the criteria.

it would probably be a series of words/patterns that would get added to over time until it was reasonably comprehensive? might be some stuff in the the HTML tags as well i bet the meta title and description on pages like that would always give it away

what i normally see are things like Cloudflare, Login, Too Many Requests etc.

devinschumacher avatar Nov 28 '23 00:11 devinschumacher