constantly getting TargetCloseError ProtocolError on specific page i want to zim
Hello. Using pretty recent zimit version:
ghcr.io/openzim/zimit latest 24d0e3419bf1 5 weeks ago 3.81GB
I try to crawl a site. I tried to change parameters, also put delays to be nice but i still cannot figure out if i'm just banned or not. During errors i'm able to open specific pages with side browser.
Also when i open the page i see some requests are timing out for some domains like:
https://hg1.hitbox.com/HG?hc=w153&l=y&hb=WQ500615O5SF28EN0&cd=1&n=DATABASE
I tried to mitigate it by not waiting for the pages to fully load --waitUntil domcontentloaded
this is more less my (truncated to important ones) list of arguments:
--pageLoadTimeout 15 --behaviorTimeout 31 --waitUntil domcontentloaded --pageExtraDelay 5 --workers 1
this is the logs around the moment i start receiving strange ProtocolError
{"timestamp":"2025-05-20T12:20:55.314Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":487,"total":1377,"pen
ding":1,"failed":0,"limit":{"max":10000,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-20T12:20:55.309Z\",\"extraHops\":0,\"url\":\"ht
tps:\\/\\/gb64.com\\/oldsite\\/gameofweek\\/17\\/util_speech.htm\",\"added\":\"2025-05-20T10:13:06.963Z\",\"depth\":2}"]}}
{"timestamp":"2025-05-20T12:21:10.330Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":15,"page
":"https://gb64.com/oldsite/gameofweek/17/util_speech.htm","workerid":0}}
{"timestamp":"2025-05-20T12:21:10.366Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://gb64.com/f
orum/search.php?search_id=unanswered&sid=87de961a338d260a59775a7bfaea55cc"}}
{"timestamp":"2025-05-20T12:21:10.371Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":487,"total":1377,"pen
ding":1,"failed":0,"limit":{"max":10000,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-20T12:21:10.365Z\",\"extraHops\":0,\"url\":\"ht
tps:\\/\\/gb64.com\\/forum\\/search.php?search_id=unanswered&sid=87de961a338d260a59775a7bfaea55cc\",\"added\":\"2025-05-20T10:13:26.133Z\",\"depth\":2}"]
}}
{"timestamp":"2025-05-20T12:21:25.387Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":15,"page
":"https://gb64.com/forum/search.php?search_id=unanswered&sid=87de961a338d260a59775a7bfaea55cc","workerid":0}}
{"timestamp":"2025-05-20T12:21:25.421Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://gb64.com/f
orum/search.php?search_id=active_topics&sid=87de961a338d260a59775a7bfaea55cc"}}
{"timestamp":"2025-05-20T12:21:25.423Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":487,"total":1377,"pen
ding":1,"failed":0,"limit":{"max":10000,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-20T12:21:25.420Z\",\"extraHops\":0,\"url\":\"ht
tps:\\/\\/gb64.com\\/forum\\/search.php?search_id=active_topics&sid=87de961a338d260a59775a7bfaea55cc\",\"added\":\"2025-05-20T10:13:26.136Z\",\"depth\":2
}"]}}
{"timestamp":"2025-05-20T12:21:40.440Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":15,"page
":"https://gb64.com/forum/search.php?search_id=active_topics&sid=87de961a338d260a59775a7bfaea55cc","workerid":0}}
{"timestamp":"2025-05-20T12:21:40.470Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://gb64.com/f
orum/search.php?sid=87de961a338d260a59775a7bfaea55cc"}}
{"timestamp":"2025-05-20T12:21:40.472Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":487,"total":1377,"pen
ding":1,"failed":0,"limit":{"max":10000,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-20T12:21:40.469Z\",\"extraHops\":0,\"url\":\"ht
tps:\\/\\/gb64.com\\/forum\\/search.php?sid=87de961a338d260a59775a7bfaea55cc\",\"added\":\"2025-05-20T10:13:26.139Z\",\"depth\":2}"]}}
{"timestamp":"2025-05-20T12:21:55.488Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":15,"page
":"https://gb64.com/forum/search.php?sid=87de961a338d260a59775a7bfaea55cc","workerid":0}}
{"timestamp":"2025-05-20T12:21:55.518Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://gb64.com/f
orum/app.php/help/faq?sid=87de961a338d260a59775a7bfaea55cc"}}
{"timestamp":"2025-05-20T12:21:55.523Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":487,"total":1377,"pen
ding":1,"failed":0,"limit":{"max":10000,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-20T12:21:55.517Z\",\"extraHops\":0,\"url\":\"ht
tps:\\/\\/gb64.com\\/forum\\/app.php\\/help\\/faq?sid=87de961a338d260a59775a7bfaea55cc\",\"added\":\"2025-05-20T10:13:26.140Z\",\"depth\":2}"]}}
{"timestamp":"2025-05-20T12:22:10.539Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":15,"page
":"https://gb64.com/forum/app.php/help/faq?sid=87de961a338d260a59775a7bfaea55cc","workerid":0}}
{"timestamp":"2025-05-20T12:22:10.626Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/olds
ite/gameofweek/6/gotw_scoop!.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
{"timestamp":"2025-05-20T12:22:10.627Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/olds
ite/gameofweek/7/gotw_adventureconstrset.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
{"timestamp":"2025-05-20T12:22:10.627Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/olds
ite/gameofweek/7/gotw_robinofsherwood.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
{"timestamp":"2025-05-20T12:22:10.628Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/olds
ite/gameofweek/7/gotw_starcross.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
{"timestamp":"2025-05-20T12:22:10.629Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/olds
ite/gameofweek/7/gotw_wizardandprincess.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
{"timestamp":"2025-05-20T12:22:10.630Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/olds
ite/gameofweek/8/gotw_magiciansball.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
i also got single more meaning rich error:
{"timestamp":"2025-05-20T12:22:10.767Z","logLevel":"warn","context":"behavior","message":"Behavior run partially failed","details":{"reason":{"type":"exception","message":"Protocol error (Runtime.evaluate): Target closed","stack":"TargetCloseError: Protocol error (Runtime.evaluate): Target closed\n at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:77:36)\n at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:106:25)\n at Connection.onMessage (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:130:25)\n at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:38:32)\n at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n at WebSocket.onMessage (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:209:9)\n at WebSocket.emit (node:events:524:28)\n at Receiver.receiverOnMessage (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1220:20)\n at Receiver.emit (node:events:524:28)\n at Immediate.<anonymous> (/app/node_modules/puppeteer-core/node_modules/ws/lib/receiver.js:601:16)"},"page":"https://gb64.com/oldsite/gameofweek/6/gotw_redmoon.htm","workerid":0}}
{"timestamp":"2025-05-20T12:22:15.809Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"http://gb64.com/"}}
It seems that before i get this ProtocolError i got a lot of timeouts. And the crawl seems to be stucks on statistics.
So Am i simbly banned or there is an issue in the crawl maybe caused by this hangling request to hg1.hitbox.com ?
To me this looks more like a problem of communication between the crawler and the Chromium browser. Which might be induced by a variety of reasons, but basically browser seems to be mostly crashed. Could be that website is using too much resources, exhausting memory due to bad JS, ...
For now i can confirm that the problem is in the subpages.
At some page the layout is linked to a oldpage and crawler follows this.
As wrote above example is https://gb64.com/oldsite/gameofweek/6/gotw_redmoon.htm
These oldpage does not look wery complex - it is pretty old but has a lot of this hits that never ends like:
https://hg1.hitbox.com/HG?hc=w153&l=y&hb=WQ500615O5SF28EN0&cd=1&n=DATABASE
So as You wrote either
- the page is complex and crash the brovser at some point
- these dangling requests are finally crashing the browser like
https://hg1.hitbox.com/HG?hc=w153&l=y&hb=WQ500615O5SF28EN0&cd=1&n=DATABASE
when i did exclude it by:
--scopeExcludeRx='.*oldsite.*'
The crawl continues without any issues
As i think there is no way to skip exact urls from loading when a page is loaded during the crawl ?
You should be able to block any request from a page to hitbox.com with blockRules:
--blockRules Additional rules for blocking certai
n URLs from being loaded, by URL reg
ex and optionally via text match in
an iframe [array] [default: []]
Never used them myself so I'm not certain which format it should have, but this is supposed to work.
Some documentation is available at https://crawler.docs.browsertrix.com/user-guide/crawl-scope/#page-resource-block-rules
To be honest i'm not sure if i use --blockRules correctly or it it is not working as expected example like here: https://github.com/webrecorder/browsertrix-crawler/issues/574
It seems it blocks all resources.
I use it like this
--blockRules ['https://hg1.hitbox.com','https://cloud.cbm8bit.com']
but tried also some other options.
It seems the params are passed:
[zimit::2025-05-24 15:43:32,252] INFO:Running browsertrix-crawler crawl: crawl --title gamebase 64 --description An attempt to document ALL Commodore 64 gameware before its too late --workers 2 --waitUntil domcontentloaded --depth 1 --pageLoadTimeout 91 --blockRules [https://hg1.hitbox.com,https://cloud.cbm8bit.com] --behaviorTimeout 31 --diskUtilization 90 --seeds https://gb64.com --userAgentSuffix +Zimit --cwd /output/.tmp2ljdqp33
than i get errors
{"timestamp":"2025-05-24T15:43:40.264Z","logLevel":"warn","context":"blocking","message":"Block rule match for page request ignored, set --exclude to block full pages","details":{"url":"https://gb64.com/","page":"about:blank?_browsertrixh17wu74mqz"}}
{"timestamp":"2025-05-24T15:43:40.631Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://gb64.com/","frameId":"695F1629F140DC66E164FE250ABE7749"}}
{"timestamp":"2025-05-24T15:43:42.168Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://gb64.com/images/c64top/cornerfiller.gif","errorText":"net::ERR_BLOCKED_BY_CLIENT.Inspector","type":"Image","status":0,"page":"https://gb64.com/","workerid":0}}
bassically i get a lot of ERR_BLOCKED_BY_CLIENT and pages are there but with no resources
🤔
yeah maybe i wrote too fast. It seems passing urls twice do the trick and it seems it works.
--blockRules 'https://hg1.hitbox.com' --blockRules 'https://cloud.cbm8bit.com'
Will do a bigger crawl to confirm if this helps and prevent the browser to crash on oldsite problem mentionned earlier...
unfortuantelly still at some point the browser seems to crash
{"timestamp":"2025-05-24T16:49:56.208Z","logLevel":"warn","context":"behavior","message":"Behaviors timed out","details":{"seconds":31,"page":"https://gb64.com/oldsite/gameofweek/4/americanfeature/gotw_stripoker.htm","workerid":1}}
{"timestamp":"2025-05-24T16:49:58.716Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":3,"page":"https://gb64.com/oldsite/gameofweek/4/americanfeature/gotw_stripoker.htm","workerid":1}}
{"timestamp":"2025-05-24T16:49:58.764Z","logLevel":"warn","context":"behavior","message":"Behavior run partially failed","details":{"reason":{"type":"exception","message":"Protocol error (Runtime.evaluate): Target closed","stack":"TargetCloseError: Protocol error (Runtime.evaluate): Target closed\n at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:77:36)\n at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:106:25)\n at Connection.onMessage (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:130:25)\n at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:38:32)\n at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n at WebSocket.onMessage (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:209:9)\n at WebSocket.emit (node:events:524:28)\n at Receiver.receiverOnMessage (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1220:20)\n at Receiver.emit (node:events:524:28)\n at Immediate.<anonymous> (/app/node_modules/puppeteer-core/node_modules/ws/lib/receiver.js:601:16)"},"page":"https://gb64.com/oldsite/gameofweek/4/americanfeature/gotw_stripoker.htm","workerid":1}}
and later every page looks like this
{"timestamp":"2025-05-24T16:56:06.356Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":91,"page":"https://gb64.com/oldsite/gameofweek/5/americanfeature2/gotw_track&field.htm","workerid":1}}
{"timestamp":"2025-05-24T16:56:06.401Z","logLevel":"warn","context":"recorder","message":"Error getting cookies","details":{"page":"https://gb64.com/oldsite/gameofweek/5/americanfeature2/gotw_track&field.htm","e":{"name":"TargetCloseError","cause":{"name":"ProtocolError"}}}}
Did you achieved to confirm it really blocks the URL you do not want to load by looking at WARC content? If yes, then you probably have something else causing a browser crash...
it seems so
{"timestamp":"2025-05-26T09:29:15.692Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://cloud.cbm8bit.com/zzap/1200-plain_grey4.png","errorText":"net::ERR_BLOCKED_BY_CLIENT.Inspector","type":"Image","status":0,"page":"https://gb64.com/forum/index.php","workerid":0}}
but only second argument. when i put two of them:
--blockRules 'https://hg1.hitbox.com' --blockRules 'https://cloud.cbm8bit.com'
only second is passed (somehow overriden)
this is run command from logs:
INFO:Running browsertrix-crawler crawl: crawl --title gamebase 64 --description An attempt to document ALL Commodore 64 gameware before its too late --workers 2 --waitUntil domcontentloaded --depth 10 --pageLoadTimeout 91 --scopeExcludeRx .*oldsite.* --blockRules https://cloud.cbm8bit.com --behaviorTimeout 31 --saveState always --diskUtilization 90 --seeds https://gb64.com --userAgentSuffix +Zimit --cwd /output/.tmp23327r06
So not sure how to pass two urls besides cobining them into one regex (if this will work)
Anyway after crawlinkg 70k pages with excluding whole oldsite i still have error but seems to be different now a bit:
{"timestamp":"2025-05-26T06:51:18.467Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://gb64.com/forum/uc
p.php?mode=login&redirect=viewtopic.php%3Fstart%3D45%26t%3D4718","workerid":0}}
{"timestamp":"2025-05-26T06:51:18.923Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://cloud.cbm8bit.com/zza
p/gb64_forum_background.png","errorText":"net::ERR_BLOCKED_BY_CLIENT.Inspector","type":"Image","status":0,"page":"https://gb64.com/forum/ucp.php?mode=log
in&redirect=viewtopic.php%3Fstart%3D45%26t%3D4718","workerid":0}}
{"timestamp":"2025-05-26T06:51:18.944Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://cloud.cbm8bit.com/zza
p/1200-plain_grey4.png","errorText":"net::ERR_BLOCKED_BY_CLIENT.Inspector","type":"Image","status":0,"page":"https://gb64.com/forum/ucp.php?mode=login&re
direct=viewtopic.php%3Fstart%3D45%26t%3D4718","workerid":0}}
{"timestamp":"2025-05-26T06:51:19.602Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://gb64.
com/forum/viewtopic.php?p=19228","workerid":1}}
{"timestamp":"2025-05-26T06:51:19.626Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://gb64.com/f
orum/viewtopic.php?t=4718&start=45&view=print"}}
{"timestamp":"2025-05-26T06:51:19.628Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":72806,"total":156432,
"pending":2,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-26T06:51:19.626Z\",\"extraHops\":0,\"url\":\"ht
tps:\\/\\/gb64.com\\/forum\\/viewtopic.php?t=4718&start=45&view=print\",\"added\":\"2025-05-25T03:15:39.085Z\",\"depth\":5}","{\"seedId\":0,\"started\":\
"2025-05-26T06:51:18.348Z\",\"extraHops\":0,\"url\":\"https:\\/\\/gb64.com\\/forum\\/ucp.php?mode=login&redirect=viewtopic.php%3Fstart%3D45%26t%3D4718\",
\"added\":\"2025-05-25T03:15:39.014Z\",\"depth\":5}"]}}
{"timestamp":"2025-05-26T06:51:19.727Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://gb64.com/forum/vi
ewtopic.php?t=4718&start=45&view=print","workerid":1}}
{"timestamp":"2025-05-26T06:51:20.097Z","logLevel":"warn","context":"general","message":"Invalid Page - URL must start with http:// or https://","details
":{"url":"javascript:void(0);","page":"https://gb64.com/forum/ucp.php?mode=login&redirect=viewtopic.php%3Fstart%3D45%26t%3D4718","workerid":0}}
{"timestamp":"2025-05-26T06:51:27.777Z","logLevel":"error","context":"general","message":"Custom page load check timed out","details":{"seconds":5,"page"
:"https://gb64.com/forum/viewtopic.php?t=4718&start=45&view=print","workerid":1}}
{"timestamp":"2025-05-26T06:51:32.784Z","logLevel":"error","context":"general","message":"Link extraction timed out","details":{"seconds":5,"page":"https
://gb64.com/forum/viewtopic.php?t=4718&start=45&view=print","workerid":1}}
{"timestamp":"2025-05-26T06:51:37.792Z","logLevel":"error","context":"general","message":"Timed out getting page title, something is likely wrong","details":{"seconds":5,"page":"https://gb64.com/forum/viewtopic.php?t=4718&start=45&view=print","workerid":1}}
{"timestamp":"2025-05-26T06:51:51.182Z","logLevel":"warn","context":"behavior","message":"Behaviors timed out","details":{"seconds":31,"page":"https://gb64.com/forum/ucp.php?mode=login&redirect=viewtopic.php%3Fstart%3D45%26t%3D4718","workerid":0}}
{"timestamp":"2025-05-26T06:51:52.185Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":3,"page":"https://gb64.com/forum/ucp.php?mode=login&redirect=viewtopic.php%3Fstart%3D45%26t%3D4718","workerid":0}}
{"timestamp":"2025-05-26T06:51:52.212Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://gb64.com/forum/viewtopic.php?t=4718&start=75"}}
{"timestamp":"2025-05-26T06:51:52.215Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":72807,"total":156433,"pending":2,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-05-26T06:51:52.211Z\",\"extraHops\":0,\"url\":\"https:\\/\\/gb64.com\\/forum\\/viewtopic.php?t=4718&start=75\",\"added\":\"2025-05-25T03:15:39.095Z\",\"depth\":5}","{\"seedId\":0,\"started\":\"2025-05-26T06:51:19.626Z\",\"extraHops\":0,\"url\":\"https:\\/\\/gb64.com\\/forum\\/viewtopic.php?t=4718&start=45&view=print\",\"added\":\"2025-05-25T03:15:39.085Z\",\"depth\":5}"]}}
{"timestamp":"2025-05-26T06:53:23.217Z","logLevel":"error","context":"fetch","message":"Direct fetch of page URL timed out","details":{"seconds":91,"page":"https://gb64.com/forum/viewtopic.php?t=4718&start=75","workerid":0}}
{"timestamp":"2025-05-26T06:53:23.243Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://gb64.com/forum/viewtopic.php?p=19229"}}
so from above
{"timestamp":"2025-05-26T06:51:37.792Z","logLevel":"error","context":"general","message":"Timed out getting page title, something is likely wrong","details":{"seconds":5,"page":"https://gb64.com/forum/viewtopic.php?t=4718&start=45&view=print","workerid":1}}
after that i see there is no progress and crawl is constantly timeing out on pages without progress. also see that brave processes are doing nothing.
i will try to explore more of exluding urls and also play more with
--keep --saveState always
to be able to continue somehow after days of crawling really.
Not sure why this is so unstable with this page i try to get for offline...
Late follow up.
It seems that my problems come out from not headless runs.
I tried to disable display-manager first and than it uses xvfb with headless buffer when it runs.
This was still failing ocassionally.
Later i discovered obvious flag
--headless
and it seems it do the trick. I did few runs that took few days and these completes successfully
Thank you @clydzik for the follow-up. Doesn't it have other negative side-effects when running headless? I.e. does the ZIM works as expected? I don't remember exact details, but I feel like not running headless by default was an educated choice.
Hey. For now i didnt noticed any negative sideffect. Zim was created and looks right way when browsed Also will make more run also avoid exclusions i did before in this step (comment): https://github.com/openzim/zimit/issues/500#issuecomment-2900671519
And will post update on stability.