Ivan Kozik
Ivan Kozik
Which OS did it fail on? The `pyenv`-based build brings in its own Python headers.
But note https://github.com/chfoo/wpull/issues/131
CRIU might be another option for suspending/resuming crawls on Ubuntu 15.10/16.04+. It works by dumping/restoring restoring a snapshot of the process to/from disk. As root, I managed to dump and...
I had some success with `criu dump --tcp-established --shell-job --ghost-limit 20000000 -t PID` and `criu restore --tcp-established --shell-job` (in a tmux) again, but unfortunately grab-site processes crash about 50% of...
Yeah, it would be better if this worked for any crawl that includes a reddit URL, not just those that start with a reddit URL.
Similarly, send `Cookie: NCR=1` to all *.blogspot.com URLs
I could not repro this on macOS 10.14 (with homebrew install) just now, but systwi says it still happens on 10.13.6 (which install is not known).
I can't repro on macOS 11, was this fixed in lxml?
Possible implementation strategy: Implement #59 so that the user can easily adjust delays on a per-domain basis. For each 429 response, add (# of connections being used * 1 second)...
grab-site currently doesn't really have anyone developing it (I just try to keep the install steps working), but I have no objections to the addition of WACZ support.