zimit
zimit copied to clipboard
Browsertrix Crawler is stopping on disk full while it is not full
Browsertrix crawler: version 1.0.0-beta.6
This occured on Zimit 2 but might have no link with it, since it could be either a crawler problem or a Docker / Zimfarm issue.
Recipe: https://farm.openzim.org/recipes/bbc.com_persian Task: https://farm.openzim.org/pipeline/29c24848-9c12-4253-8939-77254b01fdd5
{"timestamp":"2024-03-21T11:46:53.570Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.bbc.com/persian/articles/c9rw9yr764eo","workerid":0}}
{"timestamp":"2024-03-21T11:46:53.578Z","logLevel":"info","context":"general","message":"Disk utilization projected to reach threshold 90% > 90%, stopping","details":{}}
{"timestamp":"2024-03-21T11:46:53.578Z","logLevel":"info","context":"general","message":"Crawler interrupted, gracefully finishing current pages","details":{}}
{"timestamp":"2024-03-21T11:46:53.578Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2024-03-21T11:46:53.852Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp9jqp9697/collections/crawl-20240319073437143/crawls/crawl-20240321114653-a14d9f23d744.yaml","details":{}}
{"timestamp":"2024-03-21T11:46:53.864Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":22833,"total":43845,"pending":0,"failed":4,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2024-03-21T11:46:53.865Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2024-03-21T11:46:53.866Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
[zimit::2024-03-21 11:46:53,893] INFO:crawl interupted by a limit
I will investigate a bit before reporting upstream, I first need to confirm this is not a problem linked to the Zimfarm handling of Docker containers or our custom image.