grab-site
grab-site copied to clipboard
Crawls sometimes hang forever
Could this be fixed by killing the wpull connections (documented on the AT Wiki's ArchiveBot page)?
I haven't tried it, but I assume so. It should not be necessary to use a tool with so many restrictions though.
As a workaround, you can use the kill-wpull-connections script; it requires pgrep, lsof, and gdb. Depending on the machine configuration (specifically, the value of kernel.yama.ptrace_scope in /proc/sys/kernel/yama/ptrace_scope), it may also require root/sudo privileges.
https://wiki.archiveteam.org/index.php/ArchiveBot
https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/kill-wpull-connections
I know, but it'd be a temporary solution.
Not even the kill-wpull-connections worked in my recent case of this; there was no wpull process running on my system at all. As the wpull.db wasn't corrupt, I was able to extract the todo and in_progress urls with gs-dump-urls after killing the crawl though.