benoit74 issues

Results 604 issues of


                                            benoit74

Impossible to pass multiple lang for ZIM

See https://farm.openzim.org/pipeline/9d98839a-87d3-4a7b-aaf7-4d8ad4168938 ``` Traceback (most recent call last): File "/usr/bin/zimit", line 8, in sys.exit(zimit.zimit()) ~~~~~~~~~~~^^ File "/app/zimit/lib/python3.13/site-packages/zimit/zimit.py", line 1247, in zimit sys.exit(run(sys.argv[1:])) ~~~^^^^^^^^^^^^^^ File "/app/zimit/lib/python3.13/site-packages/zimit/zimit.py", line 852, in run res...

bug

`--zim-lang` argument does not make it to the ZIM file

See https://browse.library.kiwix.org/raw/www.ready.gov_es_2024-12/meta/Language => `eng` despite this being built with `--zim-lang esp` at https://farm.openzim.org/pipeline/37390d21-a7c9-4eed-a206-e1a0fe0daed2/debug Not sure if the issue is at zimit level (we do not pass the parameter properly to...

bug

Stop crawler when we have been hit by a WAF protection

See details upstream: https://github.com/webrecorder/browsertrix-crawler/issues/711

enhancement

realitybloger.wordpress.com fails to be crawled

Someone insisted quite a lot on this website. It always fails on seed page with strange timeouts while doing link extraction and looking for page title

bug

scraping_issue

Issue with websites based on Vuepress/Vitepress

See https://github.com/kiwix/kiwix-desktop/issues/1324

bug

question

Size limitation is not working as expected

Looking at zimit.kiwix.org jobs, it looks like there is a problem in size limitation. This is most probably an upstream bug in crawler, I will open issue there with details.

bug

upstream

Automatically monitor and stop frozen crawler or warc2zim

Since we have the chance in Zimit to have a "monitoring" process, and since current situation but also past ones showed that we regularly have situations where the crawler and/or...

enhancement

Release 3.1.0

This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially warc2zim in pyproject.toml and browsertrix crawler in...

task

Resume failed browsertrix crawls

Every now and then, we have very long crawl to perform. E.g. https://farm.openzim.org/recipes/shamela.ws_ar_al-tafsir-3 has ~500k pages to grab. Or https://farm.openzim.org/recipes/ubuntuforums.org_en_all which has already discoverd ~400k pages. This poses two challenges...

enhancement

Display in API and UI when worker is `selfish`

In DB we have a `selfish` column on workers which is very important in terms of scheduling. This information must be: - returned in the API (AFAIK it is not)...

enhancement