JustAnotherArchivist

Results 394 comments of JustAnotherArchivist

Nice to hear from you. :-) > in principle, there's no actual reason a web crawl should need anywhere near that amount of disk It depends. We've had a number...

@systwi-again I don't see any advantage to keeping a meaningless option. Backwards compatibility is not a concern here as commands are issued by humans, not scripts. And as you said,...

> The only potential issue I could think of is if the `pending-large` queue no longer exists and a pipeline using the old, pre-removal code tries to get a job...

But it breaks with concurrency > 1. Or rather, you wouldn't be able to tell which URL redirected where anymore.

Redirect targets are always processed 'immediately'. (Also, there is no pool or random order, just a queue, although links extracted from an individual page get added to the end of...

You mean both the redirect response and the redirect target response? Only if you hold back reporting the redirect until completing the chain. So the dashboard wouldn't be showing what's...

Ah, yes, of course. My comment was only about the versions that *don't* include the redirect target on/after the 30x response but rather only the 30x and then the 200,...

That is indeed what causes these crashes. One job in particular produced lines of up to 1.7 MiB. The buffer is only 1 MiB. The fix here is probably to...

Python packaging has changed (read: improved) a fair bit in the past few years. `setuptools` used to be the one and only way to do anything with packages. That's not...

I don't think 100s are relevant; requests doesn't support `Expect: 100-continue` anyway.