How to resume an interrupted crawl?
Hi, So the last 2 crawls I've had both ended in the same way: It's crawling and crawling, and then suddenly it reaches an end and it stops abruptly without cleaning the temporary files. Here are some logs:
{"timestamp":"2025-03-29T17:25:47.825Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://unbekomi ng.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:25:47.910Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"htt ps://unbekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","frameId":"2127892DBBD58DE42023FEC5F92BCE32"}} {"timestamp":"2025-03-29T17:25:47.912Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://unbekoming.s ubstack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","errorText":"net::ERR_NAME_NOT_RESOLVED","type":"Document","status":0 ,"page":"https://unbekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","workerid":1}} {"timestamp":"2025-03-29T17:25:47.916Z","logLevel":"error","context":"pageStatus","message":"Page Load Failed: retry limit reached","details":{" retry":2,"retries":2,"msg":"net::ERR_NAME_NOT_RESOLVED at https://unbekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_med ium=web","url":"https://unbekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","loadState":0,"page":"https://unb ekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","workerid":1}} {"timestamp":"2025-03-29T17:25:47.920Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"htt ps://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywa ll&utm_medium=web&utm_content=152551674","frameId":"CD9C46BBDECE6AB309340381DF12A76E"}} {"timestamp":"2025-03-29T17:25:47.927Z","logLevel":"warn","context":"pageStatus","message":"Page date missing, setting to now","details":{"url": "https://unbekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","ts":"2025-03-29T17:25:47.927Z"}} {"timestamp":"2025-03-29T17:25:47.958Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"htt ps://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=paywa ll&utm_medium=web&utm_content=152551674","frameId":"CEAFC6F57C23C9152AD66CD799818052"}} {"timestamp":"2025-03-29T17:25:47.958Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"htt ps://unbekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","frameId":"086BC81D2864C05A704253AF8CDD7FDD"}} {"timestamp":"2025-03-29T17:25:47.963Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://unbekoming.s ubstack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywall&utm_medium=web &utm_content=152551674","errorText":"net::ERR_NAME_NOT_RESOLVED","type":"Document","status":0,"page":"https://unbekoming.substack.com/subscribe? simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywall&utm_medium=web&utm_content=152551674 ","workerid":0}} {"timestamp":"2025-03-29T17:25:47.965Z","logLevel":"error","context":"pageStatus","message":"Page Load Failed: retry limit reached","details":{" retry":2,"retries":2,"msg":"net::ERR_NAME_NOT_RESOLVED at https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.su bstack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywall&utm_medium=web&utm_content=152551674","url":"https://unbekoming.substack.com/sub scribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywall&utm_medium=web&utm_content=15 2551674","loadState":0,"page":"https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fc omment%2F80087157&utm_source=paywall&utm_medium=web&utm_content=152551674","workerid":0}} {"timestamp":"2025-03-29T17:25:48.046Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://unbekoming.s ubstack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=paywall&utm_medium=web &utm_content=152551674","errorText":"net::ERR_NAME_NOT_RESOLVED","type":"Document","status":0,"page":"https://unbekoming.substack.com/subscribe? simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=paywall&utm_medium=web&utm_content=152551674 ","workerid":3}} {"timestamp":"2025-03-29T17:25:48.048Z","logLevel":"error","context":"pageStatus","message":"Page Load Failed: retry limit reached","details":{" retry":2,"retries":2,"msg":"net::ERR_NAME_NOT_RESOLVED at https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.su bstack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=paywall&utm_medium=web&utm_content=152551674","url":"https://unbekoming.substack.com/sub scribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=paywall&utm_medium=web&utm_content=15 2551674","loadState":0,"page":"https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fc omment%2F80068478&utm_source=paywall&utm_medium=web&utm_content=152551674","workerid":3}} {"timestamp":"2025-03-29T17:25:48.050Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://unbekoming.s ubstack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","errorText":"net::ERR_NAME_NOT_RESOLVED","type":"Document","status":0 ,"page":"https://unbekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:25:48.053Z","logLevel":"error","context":"pageStatus","message":"Page Load Failed: retry limit reached","details":{" retry":2,"retries":2,"msg":"net::ERR_NAME_NOT_RESOLVED at https://unbekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_med ium=web","url":"https://unbekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","loadState":0,"page":"https://unb ekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:25:48.060Z","logLevel":"warn","context":"pageStatus","message":"Page date missing, setting to now","details":{"url": "https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=p aywall&utm_medium=web&utm_content=152551674","ts":"2025-03-29T17:25:48.060Z"}} {"timestamp":"2025-03-29T17:25:58.049Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"h ttps://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=pay wall&utm_medium=web&utm_content=152551674","workerid":3}} {"timestamp":"2025-03-29T17:25:58.050Z","logLevel":"warn","context":"pageStatus","message":"Page date missing, setting to now","details":{"url": "https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=p aywall&utm_medium=web&utm_content=152551674","ts":"2025-03-29T17:25:58.050Z"}} {"timestamp":"2025-03-29T17:25:58.053Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"h ttps://unbekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:25:58.054Z","logLevel":"warn","context":"pageStatus","message":"Page date missing, setting to now","details":{"url": "https://unbekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","ts":"2025-03-29T17:25:58.054Z"}} {"timestamp":"2025-03-29T17:26:07.975Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid" :1}} {"timestamp":"2025-03-29T17:26:07.976Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":1,"type":" exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async Pag eWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Pro mise.allSettled (index 1)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler .js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-29T17:26:08.097Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid" :0}} {"timestamp":"2025-03-29T17:26:08.098Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":0,"type":" exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async Pag eWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Pro mise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler .js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-29T17:26:08.476Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://un bekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","workerid":1}} {"timestamp":"2025-03-29T17:26:08.598Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://un bekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywall&utm_ medium=web&utm_content=152551674","workerid":0}} {"timestamp":"2025-03-29T17:26:18.111Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid" :3}} {"timestamp":"2025-03-29T17:26:18.112Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":3,"type":" exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at runNextTi cks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/tim ers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker. js:206:13)\n at async Promise.allSettled (index 3)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.cra wl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)"}} {"timestamp":"2025-03-29T17:26:18.112Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid" :2}} {"timestamp":"2025-03-29T17:26:18.112Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":2,"type":" exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async Pag eWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Pro mise.allSettled (index 2)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler .js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-29T17:26:18.612Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://un bekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80068478&utm_source=paywall&utm_ medium=web&utm_content=152551674","workerid":3}} {"timestamp":"2025-03-29T17:26:18.613Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://un bekoming.substack.com/p/bovaer/comment/80068478?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:26:28.487Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid" :1}} {"timestamp":"2025-03-29T17:26:28.487Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":1,"type":" exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async Pag eWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Pro mise.allSettled (index 1)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler .js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-29T17:26:28.599Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid" :0}} {"timestamp":"2025-03-29T17:26:28.599Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":0,"type":" exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async Pag eWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Pro mise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler .js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-29T17:26:28.988Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://un bekoming.substack.com/p/bovaer/comment/80195058?utm_campaign=unknown&utm_medium=web","workerid":1}} {"timestamp":"2025-03-29T17:26:28.989Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","m essage":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/d ist/util/worker.js:161:15)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/d ist/util/worker.js:206:13)\n at async Promise.allSettled (index 1)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at a sync Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app /dist/main.js:58:1","workerid":1}} {"timestamp":"2025-03-29T17:26:29.099Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://un bekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80087157&utm_source=paywall&utm_ medium=web&utm_content=152551674","workerid":0}} {"timestamp":"2025-03-29T17:26:29.099Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","m essage":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/d ist/util/worker.js:161:15)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/d ist/util/worker.js:206:13)\n at async Promise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at a sync Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app /dist/main.js:58:1","workerid":0}} {"timestamp":"2025-03-29T17:26:29.811Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://u nbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywall&utm _medium=web&utm_content=152551674"}} {"timestamp":"2025-03-29T17:26:29.811Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":2,"page":"https://u nbekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web"}} {"timestamp":"2025-03-29T17:26:29.811Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":20625,"total ":26051,"pending":4,"failed":2658,"limit":{"max":0,"hit":false},"pendingPages":["{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:58 .101Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment% 2F80077632&utm_source=paywall&utm_medium=web&utm_content=152551674\",\"depth\":7,\"added\":\"2025-03-29T14:34:47.542Z\",\"retry\":2}","{\"extraH ops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:58.104Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/p\\/bovaer\\/comment\\/80083995?ut m_campaign=unknown&utm_medium=web\",\"depth\":7,\"added\":\"2025-03-29T14:34:48.523Z\",\"retry\":2}","{\"extraHops\":0,\"seedId\":0,\"started\": \"2025-03-29T17:25:48.097Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/p\\/bovaer\\/comment\\/80077632?utm_campaign=unknown&utm_medium=web \",\"depth\":7,\"added\":\"2025-03-29T14:34:47.542Z\",\"retry\":2}","{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:47.965Z\",\"ur l\":\"https:\\/\\/unbekoming.substack.com\\/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80195058&u tm_source=paywall&utm_medium=web&utm_content=152551674\",\"depth\":7,\"added\":\"2025-03-29T14:34:46.441Z\",\"retry\":2}"]}} {"timestamp":"2025-03-29T17:26:29.812Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":20625,"total ":26051,"pending":4,"failed":2658,"limit":{"max":0,"hit":false},"pendingPages":["{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:58 .101Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment% 2F80077632&utm_source=paywall&utm_medium=web&utm_content=152551674\",\"depth\":7,\"added\":\"2025-03-29T14:34:47.542Z\",\"retry\":2}","{\"extraH ops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:58.104Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/p\\/bovaer\\/comment\\/80083995?ut m_campaign=unknown&utm_medium=web\",\"depth\":7,\"added\":\"2025-03-29T14:34:48.523Z\",\"retry\":2}","{\"extraHops\":0,\"seedId\":0,\"started\": \"2025-03-29T17:25:48.097Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/p\\/bovaer\\/comment\\/80077632?utm_campaign=unknown&utm_medium=web \",\"depth\":7,\"added\":\"2025-03-29T14:34:47.542Z\",\"retry\":2}","{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:47.965Z\",\"ur l\":\"https:\\/\\/unbekoming.substack.com\\/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80195058&u tm_source=paywall&utm_medium=web&utm_content=152551674\",\"depth\":7,\"added\":\"2025-03-29T14:34:46.441Z\",\"retry\":2}"]}} {"timestamp":"2025-03-29T17:26:29.823Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://unbekomi ng.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywall&utm_medium =web&utm_content=152551674","workerid":3}} {"timestamp":"2025-03-29T17:26:29.823Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://unbekomi ng.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:26:29.931Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"htt ps://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywa ll&utm_medium=web&utm_content=152551674","frameId":"3F86136508D0EAD6C18E366B708619DD"}} {"timestamp":"2025-03-29T17:26:29.931Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"htt ps://unbekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","frameId":"265A785ABF3A33DA8D85442A671579AA"}} {"timestamp":"2025-03-29T17:26:29.935Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://unbekoming.s ubstack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywall&utm_medium=web &utm_content=152551674","errorText":"net::ERR_NAME_NOT_RESOLVED","type":"Document","status":0,"page":"https://unbekoming.substack.com/subscribe? simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywall&utm_medium=web&utm_content=152551674 ","workerid":3}} {"timestamp":"2025-03-29T17:26:29.939Z","logLevel":"error","context":"pageStatus","message":"Page Load Failed: retry limit reached","details":{" retry":2,"retries":2,"msg":"net::ERR_NAME_NOT_RESOLVED at https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.su bstack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywall&utm_medium=web&utm_content=152551674","url":"https://unbekoming.substack.com/sub scribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=paywall&utm_medium=web&utm_content=15 2551674","loadState":0,"page":"https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fc omment%2F80077632&utm_source=paywall&utm_medium=web&utm_content=152551674","workerid":3}} {"timestamp":"2025-03-29T17:26:29.941Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://unbekoming.s ubstack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","errorText":"net::ERR_NAME_NOT_RESOLVED","type":"Document","status":0 ,"page":"https://unbekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:26:29.944Z","logLevel":"error","context":"pageStatus","message":"Page Load Failed: retry limit reached","details":{" retry":2,"retries":2,"msg":"net::ERR_NAME_NOT_RESOLVED at https://unbekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_med ium=web","url":"https://unbekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","loadState":0,"page":"https://unb ekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","workerid":2}} {"timestamp":"2025-03-29T17:26:29.957Z","logLevel":"warn","context":"pageStatus","message":"Page date missing, setting to now","details":{"url": "https://unbekoming.substack.com/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80077632&utm_source=p aywall&utm_medium=web&utm_content=152551674","ts":"2025-03-29T17:26:29.957Z"}} {"timestamp":"2025-03-29T17:26:29.986Z","logLevel":"warn","context":"pageStatus","message":"Page date missing, setting to now","details":{"url": "https://unbekoming.substack.com/p/bovaer/comment/80083995?utm_campaign=unknown&utm_medium=web","ts":"2025-03-29T17:26:29.986Z"}} {"timestamp":"2025-03-29T17:26:29.988Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":3 }} {"timestamp":"2025-03-29T17:26:30.019Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":2 }} {"timestamp":"2025-03-29T17:26:30.271Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp3xx0v112/collections /crawl-20250329022643829/crawls/crawl-20250329172630-1d92756dd67a.yaml","details":{}} {"timestamp":"2025-03-29T17:26:30.320Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":20625,"total ":26051,"pending":2,"failed":2660,"limit":{"max":0,"hit":false},"pendingPages":["{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:48 .097Z\",\"url\":\"https:\\/\\/unbekoming.substack.com\\/p\\/bovaer\\/comment\\/80077632?utm_campaign=unknown&utm_medium=web\",\"depth\":7,\"adde d\":\"2025-03-29T14:34:47.542Z\",\"retry\":2}","{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-29T17:25:47.965Z\",\"url\":\"https:\\/\\/unb ekoming.substack.com\\/subscribe?simple=true&next=https%3A%2F%2Funbekoming.substack.com%2Fp%2Fbovaer%2Fcomment%2F80195058&utm_source=paywall&utm _medium=web&utm_content=152551674\",\"depth\":7,\"added\":\"2025-03-29T14:34:46.441Z\",\"retry\":2}"]}} {"timestamp":"2025-03-29T17:26:30.321Z","logLevel":"info","context":"general","message":"Crawling done","details":{}} {"timestamp":"2025-03-29T17:26:30.322Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}} [zimit::2025-03-29 17:26:30,416] ERROR:Crawl returned an error: 10, scraper exiting [zimit::2025-03-29 17:26:30,422] INFO:Temporary files have been kept in /output/.tmp3xx0v112, please clean them up manually once you don't need them anymore
And here is part of the other log:
{"timestamp":"2025-03-28T00:03:07.671Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://bartoll.se/search/fructose+liver+damage/page/2/","workerid":3}} {"timestamp":"2025-03-28T00:03:07.698Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/6/","workerid":2}} {"timestamp":"2025-03-28T00:03:07.706Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://bartoll.se/search/fructose+liver+damage/page/3/","workerid":4}} {"timestamp":"2025-03-28T00:03:07.715Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/2/","workerid":0}} {"timestamp":"2025-03-28T00:03:07.743Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://bartoll.se/page/4/?s=inflammation+healing+response+tissue+nutrients","workerid":1}} {"timestamp":"2025-03-28T00:04:37.670Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigation timeout of 90000 ms exceeded","url":"https://bartoll.se/search/fructose+liver+damage/page/2/","loadState":0,"page":"https://bartoll.se/search/fructose+liver+damage/page/2/","workerid":3}} {"timestamp":"2025-03-28T00:04:37.694Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigation timeout of 90000 ms exceeded","url":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/6/","loadState":0,"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/6/","workerid":2}} {"timestamp":"2025-03-28T00:04:37.702Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigation timeout of 90000 ms exceeded","url":"https://bartoll.se/search/fructose+liver+damage/page/3/","loadState":0,"page":"https://bartoll.se/search/fructose+liver+damage/page/3/","workerid":4}} {"timestamp":"2025-03-28T00:04:37.712Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigation timeout of 90000 ms exceeded","url":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/2/","loadState":0,"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/2/","workerid":0}} {"timestamp":"2025-03-28T00:04:37.739Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigation timeout of 90000 ms exceeded","url":"https://bartoll.se/page/4/?s=inflammation+healing+response+tissue+nutrients","loadState":0,"page":"https://bartoll.se/page/4/?s=inflammation+healing+response+tissue+nutrients","workerid":1}} {"timestamp":"2025-03-28T00:04:47.676Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"https://bartoll.se/search/fructose+liver+damage/page/2/","workerid":3}} {"timestamp":"2025-03-28T00:04:47.710Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/6/","workerid":2}} {"timestamp":"2025-03-28T00:04:47.711Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"https://bartoll.se/search/fructose+liver+damage/page/3/","workerid":4}} {"timestamp":"2025-03-28T00:04:47.712Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/2/","workerid":0}} {"timestamp":"2025-03-28T00:04:47.750Z","logLevel":"error","context":"worker","message":"Page Close Timed Out","details":{"seconds":10,"page":"https://bartoll.se/page/4/?s=inflammation+healing+response+tissue+nutrients","workerid":1}} {"timestamp":"2025-03-28T00:05:07.725Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":3}} {"timestamp":"2025-03-28T00:05:07.725Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":3,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 3)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-28T00:05:07.818Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":0}} {"timestamp":"2025-03-28T00:05:07.819Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":0,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/timers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)"}} {"timestamp":"2025-03-28T00:05:07.819Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":2}} {"timestamp":"2025-03-28T00:05:07.819Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":2,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/timers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 2)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)"}} {"timestamp":"2025-03-28T00:05:07.819Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":4}} {"timestamp":"2025-03-28T00:05:07.819Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":4,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 4)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-28T00:05:07.851Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":1}} {"timestamp":"2025-03-28T00:05:07.851Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":1,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 1)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-28T00:05:08.226Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/fructose+liver+damage/page/2/","workerid":3}} {"timestamp":"2025-03-28T00:05:08.319Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/2/","workerid":0}} {"timestamp":"2025-03-28T00:05:08.320Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/6/","workerid":2}} {"timestamp":"2025-03-28T00:05:08.320Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/fructose+liver+damage/page/3/","workerid":4}} {"timestamp":"2025-03-28T00:05:08.351Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/page/4/?s=inflammation+healing+response+tissue+nutrients","workerid":1}} {"timestamp":"2025-03-28T00:05:28.247Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":3}} {"timestamp":"2025-03-28T00:05:28.247Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":3,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 3)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-28T00:05:28.321Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":0}} {"timestamp":"2025-03-28T00:05:28.321Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":0,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/timers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)"}} {"timestamp":"2025-03-28T00:05:28.322Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":2}} {"timestamp":"2025-03-28T00:05:28.322Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":2,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/timers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 2)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)"}} {"timestamp":"2025-03-28T00:05:28.322Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":4}} {"timestamp":"2025-03-28T00:05:28.322Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":4,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 4)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-28T00:05:28.353Z","logLevel":"warn","context":"worker","message":"New Window Timed Out","details":{"seconds":20,"workerid":1}} {"timestamp":"2025-03-28T00:05:28.353Z","logLevel":"warn","context":"worker","message":"Error getting new page","details":{"workerid":1,"type":"exception","message":"timed out","stack":"Error: timed out\n at PageWorker.initPage (file:///app/dist/util/worker.js:95:27)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 1)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1"}} {"timestamp":"2025-03-28T00:05:28.748Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/fructose+liver+damage/page/2/","workerid":3}} {"timestamp":"2025-03-28T00:05:28.748Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","message":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/dist/util/worker.js:161:15)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 3)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1","workerid":3}} {"timestamp":"2025-03-28T00:05:28.822Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/2/","workerid":0}} {"timestamp":"2025-03-28T00:05:28.823Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","message":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/dist/util/worker.js:161:15)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/timers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)","workerid":0}} {"timestamp":"2025-03-28T00:05:28.823Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/inflammation+healing+response+tissue+nutrients/page/6/","workerid":2}} {"timestamp":"2025-03-28T00:05:28.823Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","message":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/dist/util/worker.js:161:15)\n at runNextTicks (node:internal/process/task_queues:60:5)\n at listOnTimeout (node:internal/timers:545:9)\n at process.processTimers (node:internal/timers:519:7)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 2)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)","workerid":2}} {"timestamp":"2025-03-28T00:05:28.823Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/search/fructose+liver+damage/page/3/","workerid":4}} {"timestamp":"2025-03-28T00:05:28.823Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","message":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/dist/util/worker.js:161:15)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 4)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1","workerid":4}} {"timestamp":"2025-03-28T00:05:28.853Z","logLevel":"warn","context":"worker","message":"Retrying getting new page","details":{"page":"https://bartoll.se/page/4/?s=inflammation+healing+response+tissue+nutrients","workerid":1}} {"timestamp":"2025-03-28T00:05:28.853Z","logLevel":"error","context":"worker","message":"Worker error, exiting","details":{"type":"exception","message":"no page available, shouldn't get here","stack":"Error: no page available, shouldn't get here\n at PageWorker.initPage (file:///app/dist/util/worker.js:161:15)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:233:30)\n at async PageWorker.run (file:///app/dist/util/worker.js:206:13)\n at async Promise.allSettled (index 1)\n at async runWorkers (file:///app/dist/util/worker.js:283:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1053:9)\n at async Crawler.run (file:///app/dist/crawler.js:359:13)\n at async file:///app/dist/main.js:58:1","workerid":1}} {"timestamp":"2025-03-28T00:05:28.994Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmppv4byqoe/collections/crawl-20250327105418370/crawls/crawl-20250328000528-cabeb0e422c0.yaml","details":{}} {"timestamp":"2025-03-28T00:05:29.007Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":19264,"total":20765,"pending":5,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-03-28T00:04:47.818Z\",\"extraHops\":0,\"url\":\"https:\\/\\/bartoll.se\\/search\\/blood+glucose+diabetes+type+2+damage\\/page\\/2\\/\",\"added\":\"2025-03-27T23:10:56.786Z\",\"depth\":6}","{\"seedId\":0,\"started\":\"2025-03-28T00:04:47.818Z\",\"extraHops\":0,\"url\":\"https:\\/\\/bartoll.se\\/search\\/plant+based+foods+toxic+tissue+damage\\/page\\/21\\/\",\"added\":\"2025-03-27T23:10:55.333Z\",\"depth\":6}","{\"seedId\":0,\"started\":\"2025-03-28T00:04:47.818Z\",\"extraHops\":0,\"url\":\"https:\\/\\/bartoll.se\\/search\\/blood+glucose+diabetes+type+2+damage\\/page\\/7\\/\",\"added\":\"2025-03-27T23:10:56.786Z\",\"depth\":6}","{\"seedId\":0,\"started\":\"2025-03-28T00:04:47.851Z\",\"extraHops\":0,\"url\":\"https:\\/\\/bartoll.se\\/page\\/19\\/?s=plant+based+foods+toxic+tissue+damage\",\"added\":\"2025-03-27T23:10:56.969Z\",\"depth\":6}","{\"seedId\":0,\"started\":\"2025-03-28T00:04:47.715Z\",\"extraHops\":0,\"url\":\"https:\\/\\/bartoll.se\\/search\\/plant+based+foods+toxic+tissue+damage\\/page\\/2\\/\",\"added\":\"2025-03-27T23:10:55.333Z\",\"depth\":6}"]}} {"timestamp":"2025-03-28T00:05:29.010Z","logLevel":"info","context":"general","message":"Crawling done","details":{}} {"timestamp":"2025-03-28T00:05:29.016Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
In the end, I have 14 .warc-gz archives, and I'm not entirely sure if I can do anything with them. Can I combine them somehow to still use what has been crawled?
Thanks
Looks like something bad is happening
Can I combine them somehow to still use what has been crawled?
I just documented two important points on this at https://github.com/openzim/zimit/wiki/Frequently-Asked-Questions#can-i-resume-an-interrupted-crawl and https://github.com/openzim/zimit/wiki/Frequently-Asked-Questions#how-to-convert-warc-files-into-zim
is resuming interrupted carwl possible for zimit or browsertrix? am using zimit 2.1.7 , but there's no crawls folder or crawl file. also no --config as well, how to do that from the start? am on windows 10 pro. can I resume from power outage.
Yes you can. Did you passed --keep to zimit ? Did you checked for hidden folders ? --config is supported in zimit since 2 years or so. Just updated doc mentioned above to add these details
yes I always make --keep, also checked for hidden items, nothing new, see . no crawls folder or yaml file. is there something I miss in zimit parameters?
an example of my command is this:
docker run -v %cd%/my-imp:/my-imp --name zimit_multaqa_hdeeth ghcr.io/openzim/zimit:latest zimit --output="/my-imp" --scopeType="custom"
--include="^(https://)al-maktaba.org/(book/31617).*" --favicon="https://old.shamela.ws/files/img/front/icon.png"
--name="al-maktaba.org_multaqa-ahl-hdeeth-p5" --title="ملتقي أهل الحديث" --url="https://al-maktaba.org/book/31617" --description="أرشيف - ملتقي أهل الحديث؛ الجزء الخامس"
--keep --zim-lang="ara" --diskUtilization 0 --verbose --statsFilename="/output/task_progress.json"
I'm sorry, I'm probably wrong when saying that --config is necessary to resume. Tbh, I never tried to resume a failed crawl, I will have to investigate. I now don't get how you can resume a crawl with zimit. Resuming the crawl with only browsertrix crawler is probably ok, you just pass it the same .tmpxxx folder which has been used previously.
I'm sorry, I'm probably wrong when saying that
--configis necessary to resume. Tbh, I never tried to resume a failed crawl, I will have to investigate. I now don't get how you can resume a crawl with zimit. Resuming the crawl with only browsertrix crawler is probably ok, you just pass it the same.tmpxxxfolder which has been used previously.
Is my problem related to resuming crawling? The thread got kinda hijacked by hamoudak2, and you changed the title as well, I hope my issue is still being investigated?
I believed your issue is that you wonder how to resume this interrupted crawl, but looks like I'm wrong.
If your issue is that the crawl stopped, unfortunately there is not much we can do. From the logs, it looks like the browser (used by the crawler) timed-out while fetching next page. Could be anything from a software bug at browser or website level. Won't be fixed / investigated unless you achieve to reduce the scope on when the problem happens. If this is purely random, then we don't have sufficient info to reproduce and fix it.
@adrian52999 If you just want to combine the warcs, a quick way for simple and raw merging on "windows" system, use the command : copy /b file1.warc.gz + file2.warc.gz + file3.warc.gz combined.warc.gz, but copy the full path for each file. for other systems use the command : cat *.warc.gz > ./combined/all.warc.gz https://forum.webrecorder.net/t/replaying-one-capture-that-is-broken-down-into-many-warcs/139/2 there's a feature in browsertrix --combineWARC.
For structured merging, I have read about the Python "warcio" library and FastWARC but never used any of them. also see if you can reduce the "workers" to "2".
I am not able to resume an interrupted crawl. .yaml file not found There is no such directory 'crawls' and 'crawl-xxxxxxxxxxxxxx-xxxxxxxxxxxx.yaml' file inside my 'collections' directory and its sub-directories.
docker run -v /home/user/Documents:/output ghcr.io/openzim/zimit zimit --seeds https://example.com/ --name zim-of-Example --keep --output /output
Note: https://example.com/ is not the actual website I am crawling, it's just a placeholder.
@TechSavvy-313 If you have read the previous comments, you would know that "Resuming the crawl with only browsertrix crawler is probably ok, you just pass it the same .tmpxxx folder which has been used previously". so it's in "browsertrix crawler" the parameter is to set --saveState to always. also read here tuturials to know much about it. https://www.sucho.org/browsertrix a 20-minute video: https://rb.gy/fr11pz
https://crawler.docs.browsertrix.com/user-guide/common-options/#saving-crawl-state-interrupting-and-restarting-the-crawl
@benoit74 I found that there was "crawls" folder as you said but in the past 2024-11 not now. I have one that wasn't completed. this is before Disk utilization was fixed. or it could be that the crawler save state for just this issue, I don't know. maybe you can add this feature " resuming interrupted crawls" it would be more practical to zimit. 05T05:49:58.316Z","logLevel":"info","context":"general","message":"Disk utilization threshold reached 90% > 90%, stopping","details":{}} {"timestamp":"2024-11-05T05:49:58.316Z","logLevel":"info","context":"general","message":"Crawler interrupted, gracefully finishing current pages","details":{}} {"timestamp":"2024-11-05T05:49:58.317Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}} {"timestamp":"2024-11-05T05:51:29.823Z","logLevel":"info","context":"general","message":"Saving crawl state to: /my-forum/.tmp1e1x4_f3/collections/crawl-20241104065145608/crawls/crawl-20241105055058-e2d2ea6e18ce.yaml","details":{}} {"timestamp":"2024-11-
I will comment here since this is related to continue interrupted crawl. I had a crawl that failed few times - reason stays open but basically for me browser crash and stop responding after hours
Now i had these settings applied:
--keep --saveState always
end of first attempt that failed logs:
{"timestamp":"2025-06-16T21:52:31.995Z","logLevel":"error","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Attempted to use detached Frame '1833EE64863D6B2DB18B7A5B007519AF'.","stack":"Error: Attempted to use detached Frame '1833EE64863D6B2DB18B7A5B007519AF'.\n at CdpFrame.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/util/decorators.js:99:23)\n at CdpPage.title (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/api/Page.js:1088:43)\n at Crawler.crawlPage (file:///app/dist/crawler.js:727:42)\n at async PageWorker.crawlPage (file:///app/dist/util/worker.js:166:21)\n at async PageWorker.timedCrawlPage (file:///app/dist/util/worker.js:182:13)\n at async PageWorker.runLoop (file:///app/dist/util/worker.js:237:17)\n at async PageWorker.run (file:///app/dist/util/worker.js:208:13)\n at async Promise.allSettled (index 0)\n at async runWorkers (file:///app/dist/util/worker.js:285:5)\n at async Crawler.crawl (file:///app/dist/crawler.js:1094:9)","page":"https://nasze.fm/news,48604","workerid":0}}
{"timestamp":"2025-06-16T21:52:31.999Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://nasze.fm/news,48604","workerid":0}}
{"timestamp":"2025-06-16T21:52:32.081Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2025-06-16T21:54:44.085Z","logLevel":"error","context":"recorder","message":"Finishing Fetch Timed Out","details":{"seconds":132,"page":"https://nasze.fm/news,48604","workerid":0}}
{"timestamp":"2025-06-16T21:54:44.500Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmpodnxde1w/collections/crawl-20250616071029291/crawls/20250616215444091-ef779d1f84e1-crawl-20250616071029291.yaml","details":{}}
{"timestamp":"2025-06-16T21:54:44.513Z","logLevel":"info","context":"general","message":"Removing old save-state: /output/.tmpodnxde1w/collections/crawl-20250616071029291/crawls/20250616213119761-ef779d1f84e1-crawl-20250616071029291.yaml","details":{}}
{"timestamp":"2025-06-16T21:54:44.520Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":14349,"total":15848,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-06-16T21:54:44.523Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-06-16T21:54:44.525Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
[zimit::2025-06-16 21:54:44,763] ERROR:Crawl returned an error: 10, scraper exiting
[zimit::2025-06-16 21:54:44,773] INFO:Temporary files have been kept in /output/.tmpodnxde1w, please clean them up manually once you don't need them an
than i continued with
--config /output/.tmpodnxde1w/collections/crawl-20250616071029291/crawls/20250616215444091-ef779d1f84e1-crawl-20250616071029291.yaml
crawl gets picked up correctly and continue and than failed again
{"timestamp":"2025-06-17T22:34:44.504Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://airly.org/widget/v2/?width=280&height=380&displayMeasurements=false&displayCAQI=true&autoHeight=true&autoWidth=true&language=pl&indexType=AIRLY_AQI&unitSpeed=metric&unitTemperature=celsius&latitude=51.5943025&longitude=18.737161","networkId":"2C258938CE1809D40CC78DDAE07994BB","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:77:36)\n at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:106:25)\n at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:171:21)\n at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n at WebSocket.emit (node:events:524:28)\n at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n at Socket.emit (node:events:524:28)","page":"https://nasze.fm/news,46946","workerid":0}}
{"timestamp":"2025-06-17T22:34:44.605Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed: will retry","details":{"retry":0,"retries":2,"msg":"Navigating frame was detached","url":"https://nasze.fm/news,46946","loadState":0,"page":"https://nasze.fm/news,46946","workerid":0}}
{"timestamp":"2025-06-17T22:34:44.650Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2025-06-17T22:34:45.389Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp6q6jz_2k/collections/crawl-20250617111138210/crawls/20250617223444849-33a17afb97e9-crawl-20250617111138210.yaml","details":{}}
{"timestamp":"2025-06-17T22:34:45.408Z","logLevel":"info","context":"general","message":"Removing old save-state: /output/.tmp6q6jz_2k/collections/crawl-20250617111138210/crawls/20250617221214154-33a17afb97e9-crawl-20250617111138210.yaml","details":{}}
{"timestamp":"2025-06-17T22:34:45.414Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":29403,"total":31046,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-06-17T22:34:45.419Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-06-17T22:34:45.423Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
[zimit::2025-06-17 22:34:45,559] ERROR:Crawl returned an error: 10, scraper exiting
[zimit::2025-06-17 22:34:45,569] INFO:Temporary files have been kept in /output/.tmp6q6jz_2k, please clean them up manually once you don't need them anymore
i continued again with
--config /output/.tmp6q6jz_2k/collections/crawl-20250617111138210/crawls/20250617223444849-33a17afb97e9-crawl-20250617111138210.yaml
than it failed again (but somehow made to the end with status done with errors)
{"timestamp":"2025-06-19T20:02:14.691Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp5njm1yvo/collections/crawl-20250618110549846/crawls/20250619200213770-6a7f50b767d5-crawl-20250618110549846.yaml","details":{}}
{"timestamp":"2025-06-19T20:02:14.718Z","logLevel":"info","context":"general","message":"Removing old save-state: /output/.tmp5njm1yvo/collections/crawl-20250618110549846/crawls/20250619193734843-6a7f50b767d5-crawl-20250618110549846.yaml","details":{}}
{"timestamp":"2025-06-19T20:02:14.730Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":45803,"total":46061,"pending":0,"failed":258,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-06-19T20:02:14.734Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-06-19T20:02:14.738Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: done","details":{}}
[zimit::2025-06-19 20:02:14,906] INFO:
[zimit::2025-06-19 20:02:14,909] INFO:----------
[zimit::2025-06-19 20:02:14,909] INFO:Processing WARC files in/at /output/.tmp5njm1yvo/collections/crawl-20250618110549846/archive
[zimit::2025-06-19 20:02:14,911] INFO:Calling warc2zim with these args: ['--name', 'naszeFm_limit', '--tags', 'sieradz', '--scraper-suffix', 'zimit 3.0.5', '--output', '/output', '--url', 'https://nasze.fm', '--title', 'nasze.fm', '--description', 'regionalny portal informacyjny', '/output/.tmp5njm1yvo/collections/crawl-20250618110549846/archive']
[warc2zim::2025-06-19 20:06:03,282] INFO:Expecting 23284 ZIM entries to files
[warc2zim::2025-06-19 20:06:03,285] ERROR:Unable to find WARC record for main page: ZimPath(nasze.fm/), aborting
[zimit::2025-06-19 20:06:03,299] INFO:Temporary files have been kept in /output/.tmp5njm1yvo, please clean them up manually once you don't need them anymore
Now what i see zim cannot be generated because
[warc2zim::2025-06-19 20:06:03,285] ERROR:Unable to find WARC record for main page: ZimPath(nasze.fm/), aborting
I think the issue is related to fact that when crawler continues after interrupt it creates new .tmp* folders and these are not combined thus it cannot find main page entry ?
output folder layout:
total 24K
drwxr-xr-x 6 root root 4,0K cze 19 22:02 .
drwxr-xr-x 4 root root 4,0K cze 16 09:10 ..
drwxr-xr-x 2 root root 4,0K cze 16 09:10 fails
drwx------ 3 root root 4,0K cze 18 13:05 .tmp5njm1yvo
drwx------ 3 root root 4,0K cze 17 13:11 .tmp6q6jz_2k
drwx------ 3 root root 4,0K cze 16 09:10 .tmpodnxde1w
Anyway it seems that continue is possible but finally comibing the crawls is not happening or i'm using --config incorrectly ?
after reading more it seems related to this: https://github.com/openzim/zimit/issues/499