JustAnotherArchivist comments

Results 394 comments of


                                            JustAnotherArchivist

Evaluate whether we want to keep --large

Nice to hear from you. :-) > in principle, there's no actual reason a web crawl should need anywhere near that amount of disk It depends. We've had a number...

Evaluate whether we want to keep --large

@systwi-again I don't see any advantage to keeping a meaningless option. Backwards compatibility is not a concern here as commands are issued by humans, not scripts. And as you said,...

Evaluate whether we want to keep --large

> The only potential issue I could think of is if the `pending-large` queue no longer exists and a pipeline using the old, pre-removal code tries to get a job...

change display of redirects in dashboard

But it breaks with concurrency > 1. Or rather, you wouldn't be able to tell which URL redirected where anymore.

change display of redirects in dashboard

Redirect targets are always processed 'immediately'. (Also, there is no pool or random order, just a queue, although links extracted from an individual page get added to the end of...

change display of redirects in dashboard

You mean both the redirect response and the redirect target response? Only if you hold back reporting the redirect until completing the chain. So the dashboard wouldn't be showing what's...

change display of redirects in dashboard

Ah, yes, of course. My comment was only about the versions that *don't* include the redirect target on/after the 30x response but rather only the 30x and then the 200,...

Dashboard WebSocket server crashing with `asyncio.streams.LimitOverrunError`

That is indeed what causes these crashes. One job in particular produced lines of up to 1.7 MiB. The buffer is only 1 MiB. The fix here is probably to...

[Bug] No module named 'pkg_resources'

Python packaging has changed (read: improved) a fair bit in the past few years. `setuptools` used to be the one and only way to do anything with packages. That's not...

Upload: Retry on exceptions instead of crashing

I don't think 100s are relevant; requests doesn't support `Expect: 100-continue` anyway.