grab-site icon indicating copy to clipboard operation
grab-site copied to clipboard

Crawls sometimes hang forever

Open ivan opened this issue 9 years ago • 4 comments

ivan avatar Dec 16 '15 12:12 ivan

Could this be fixed by killing the wpull connections (documented on the AT Wiki's ArchiveBot page)?

TheTechRobo avatar Oct 24 '21 22:10 TheTechRobo

I haven't tried it, but I assume so. It should not be necessary to use a tool with so many restrictions though.

As a workaround, you can use the kill-wpull-connections script; it requires pgrep, lsof, and gdb. Depending on the machine configuration (specifically, the value of kernel.yama.ptrace_scope in /proc/sys/kernel/yama/ptrace_scope), it may also require root/sudo privileges.

https://wiki.archiveteam.org/index.php/ArchiveBot

https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/kill-wpull-connections

ivan avatar Oct 25 '21 04:10 ivan

I know, but it'd be a temporary solution.

TheTechRobo avatar Oct 25 '21 12:10 TheTechRobo

Not even the kill-wpull-connections worked in my recent case of this; there was no wpull process running on my system at all. As the wpull.db wasn't corrupt, I was able to extract the todo and in_progress urls with gs-dump-urls after killing the crawl though.

TheTechRobo avatar Jan 22 '22 17:01 TheTechRobo