benoit74

Results 604 issues of benoit74

See https://farm.openzim.org/pipeline/9d98839a-87d3-4a7b-aaf7-4d8ad4168938 ``` Traceback (most recent call last): File "/usr/bin/zimit", line 8, in sys.exit(zimit.zimit()) ~~~~~~~~~~~^^ File "/app/zimit/lib/python3.13/site-packages/zimit/zimit.py", line 1247, in zimit sys.exit(run(sys.argv[1:])) ~~~^^^^^^^^^^^^^^ File "/app/zimit/lib/python3.13/site-packages/zimit/zimit.py", line 852, in run res...

bug

See https://browse.library.kiwix.org/raw/www.ready.gov_es_2024-12/meta/Language => `eng` despite this being built with `--zim-lang esp` at https://farm.openzim.org/pipeline/37390d21-a7c9-4eed-a206-e1a0fe0daed2/debug Not sure if the issue is at zimit level (we do not pass the parameter properly to...

bug

See details upstream: https://github.com/webrecorder/browsertrix-crawler/issues/711

enhancement

Someone insisted quite a lot on this website. It always fails on seed page with strange timeouts while doing link extraction and looking for page title

bug
scraping_issue

See https://github.com/kiwix/kiwix-desktop/issues/1324

bug
question

Looking at zimit.kiwix.org jobs, it looks like there is a problem in size limitation. This is most probably an upstream bug in crawler, I will open issue there with details.

bug
upstream

Since we have the chance in Zimit to have a "monitoring" process, and since current situation but also past ones showed that we regularly have situations where the crawler and/or...

enhancement

This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially warc2zim in pyproject.toml and browsertrix crawler in...

task

Every now and then, we have very long crawl to perform. E.g. https://farm.openzim.org/recipes/shamela.ws_ar_al-tafsir-3 has ~500k pages to grab. Or https://farm.openzim.org/recipes/ubuntuforums.org_en_all which has already discoverd ~400k pages. This poses two challenges...

enhancement

In DB we have a `selfish` column on workers which is very important in terms of scheduling. This information must be: - returned in the API (AFAIK it is not)...

enhancement