zimit icon indicating copy to clipboard operation
zimit copied to clipboard

Browsertrix Crawler is stopping on disk full while it is not full

Open benoit74 opened this issue 10 months ago • 0 comments

Browsertrix crawler: version 1.0.0-beta.6

This occured on Zimit 2 but might have no link with it, since it could be either a crawler problem or a Docker / Zimfarm issue.

Recipe: https://farm.openzim.org/recipes/bbc.com_persian Task: https://farm.openzim.org/pipeline/29c24848-9c12-4253-8939-77254b01fdd5

image

{"timestamp":"2024-03-21T11:46:53.570Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.bbc.com/persian/articles/c9rw9yr764eo","workerid":0}}
{"timestamp":"2024-03-21T11:46:53.578Z","logLevel":"info","context":"general","message":"Disk utilization projected to reach threshold 90% > 90%, stopping","details":{}}
{"timestamp":"2024-03-21T11:46:53.578Z","logLevel":"info","context":"general","message":"Crawler interrupted, gracefully finishing current pages","details":{}}
{"timestamp":"2024-03-21T11:46:53.578Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2024-03-21T11:46:53.852Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp9jqp9697/collections/crawl-20240319073437143/crawls/crawl-20240321114653-a14d9f23d744.yaml","details":{}}
{"timestamp":"2024-03-21T11:46:53.864Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":22833,"total":43845,"pending":0,"failed":4,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2024-03-21T11:46:53.865Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2024-03-21T11:46:53.866Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
[zimit::2024-03-21 11:46:53,893] INFO:crawl interupted by a limit

I will investigate a bit before reporting upstream, I first need to confirm this is not a problem linked to the Zimfarm handling of Docker containers or our custom image.

benoit74 avatar Mar 25 '24 10:03 benoit74