browsertrix-crawler Crawler SIGTERM by itself while running

Hi there.

When I run browsertrix-crawler on large sites (ex: www.androidcentral.com), after 2-3 hrs of crawling, the crawler will crash by itself - specifically, it looks like for some reason, it SIGTERMS itself.

Terminal output is as below:

docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --url https://www.<somewebsite>.com/ --generateWACZ --text --workers 8 --collection androidcentral
Text Extraction: Enabled
Load timeout for https://<somewebsite>.com/968 TimeoutError: Navigation timeout of 90000 ms exceeded 15:23:00.924
    at /app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111
Load timeout for https://<somewebsite>.com/793 TimeoutError: Navigation timeout of 90000 ms exceeded 15:23:00.924
    at /app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111
Load timeout for https://<somewebsite>.com/595 TimeoutError: Navigation timeout of 90000 ms exceeded15:23:00.924
    at /app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111
Load timeout for https://<somewebsite>.com/1145 Error: net::ERR_TOO_MANY_RETRIES at https://<somewebsite>.com/1145
    at navigate (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at runMicrotasks (<anonymous>), errors: 1 (0.05%)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async FrameManager.navigateFrame (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/FrameManager.js:90:21)   8
    at async Frame.goto (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/FrameManager.js:416:16)
    at async Page.goto (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Page.js:789:16)
    at async Crawler.loadPage (/app/crawler.js:427:7)tails/1406
    at async Crawler.module.exports [as driver] (/app/defaultDriver.js:3:3)02
    at async Crawler.crawlPage (/app/crawler.js:258:7)ails/1145
SIGTERM received, exiting
ERRO[12641] error waiting for container: unexpected EOF

Any way to resolve this? thanks!

Sep 18 '21 21:09 TowardMyth

It's hard to say exactly, as its not easy to repro, but I suspect its a running-out-of-memory issue (probably RAM). At least that's what searching for 'error waiting for container: EOF' seems to suggest. You could try running with less workers to see if it happens again? And then try docker inspect on the container to see if there is more info on why it was SIGTERMed.

The next release will make it a bit easier to restart the crawl. With the 0.5.0, the container will automatically try to save the crawl config + current state of the crawl to the crawls directory such that it can be restarted again. You could try running with that version, currently on the: https://github.com/webrecorder/browsertrix-crawler/tree/save-state branch.

Sep 20 '21 05:09 ikreymer

Thanks. I am running jobs using 8 workers. Is that considered to be a lot / too many, or is it within the acceptable range for browsertrix (and therefore, should not crash)?

Sep 20 '21 05:09 TowardMyth

I can report that I've seen a strange SIGTERM as well, on an Ubuntu snap-based Docker deployment. Of course, when I added extra logging and restarted the daemon, it didn't happen! So I guess one possibility is to try restarting the docker daemon if it happens again? Will update if I find more info..

Sep 25 '21 18:09 ikreymer

Haven't been able to repro since, but I suspect it was oomkilled. We now have Period State Saving which can help if this happens in the future. Closing, since exact repro is uncertain and not sure its fully solvable if OOM.

Comment if there is a specific repro.

Sep 10 '22 04:09 ikreymer

browsertrix-crawler browsertrix-crawler copied to clipboard

Crawler SIGTERM by itself while running

browsertrix-crawler
browsertrix-crawler copied to clipboard