JustAnotherArchivist

Results 394 comments of JustAnotherArchivist

@thebham24 That's not what this issue is about. Please file a separate one and include examples there.

Real-world example of something that effectively can't be archived due to this without ridiculous manual effort: ShoutWiki. It refuses connections in a pretty much random manner, which then wrecks any...

@ivan It's definitely much better thanks to #558, but I wouldn't consider this solved. If we scale up (there are still at least three big pipelines out of operation), we...

This happened again today for the first time in years on job 439f7f2u6bg6nb3mify0100xe, shortly after it was resumed at full speed following a cookie jar clearing. `lsof` output below. In...

Also, regarding `CLOSE_WAIT`, that might not mean anything. Everything would show up like that eventually on a crashed process, I think, because wpull crashed and thus doesn't clean up the...

This appears to be related to redirect loops. Early on, job ep1s2jjdyxcg4dtwzdumxjenf's open FD count was growing linearly with the number of 'Too many redirects' errors, and the number of...

Yeah, it can be done externally. Here's a more solid approach with `jq` that doesn't break e.g. if a filename contains 'original' or spaces or other weird characters: ia metadata...

@jjjake Haven't tested it, but that looks great!

Thanks. This is likely user agent detection on VK's side. Relatedly, the scraper was already broken before: #737

Yeah, I fixed it, but then I changed something else which broke it again.