benoit74
benoit74
See https://farm.openzim.org/pipeline/9d98839a-87d3-4a7b-aaf7-4d8ad4168938 ``` Traceback (most recent call last): File "/usr/bin/zimit", line 8, in sys.exit(zimit.zimit()) ~~~~~~~~~~~^^ File "/app/zimit/lib/python3.13/site-packages/zimit/zimit.py", line 1247, in zimit sys.exit(run(sys.argv[1:])) ~~~^^^^^^^^^^^^^^ File "/app/zimit/lib/python3.13/site-packages/zimit/zimit.py", line 852, in run res...
See https://browse.library.kiwix.org/raw/www.ready.gov_es_2024-12/meta/Language => `eng` despite this being built with `--zim-lang esp` at https://farm.openzim.org/pipeline/37390d21-a7c9-4eed-a206-e1a0fe0daed2/debug Not sure if the issue is at zimit level (we do not pass the parameter properly to...
See details upstream: https://github.com/webrecorder/browsertrix-crawler/issues/711
Someone insisted quite a lot on this website. It always fails on seed page with strange timeouts while doing link extraction and looking for page title
See https://github.com/kiwix/kiwix-desktop/issues/1324
Looking at zimit.kiwix.org jobs, it looks like there is a problem in size limitation. This is most probably an upstream bug in crawler, I will open issue there with details.
Since we have the chance in Zimit to have a "monitoring" process, and since current situation but also past ones showed that we regularly have situations where the crawler and/or...
This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially warc2zim in pyproject.toml and browsertrix crawler in...
Every now and then, we have very long crawl to perform. E.g. https://farm.openzim.org/recipes/shamela.ws_ar_al-tafsir-3 has ~500k pages to grab. Or https://farm.openzim.org/recipes/ubuntuforums.org_en_all which has already discoverd ~400k pages. This poses two challenges...
In DB we have a `selfish` column on workers which is very important in terms of scheduling. This information must be: - returned in the API (AFAIK it is not)...