benoit74

Results 604 issues of benoit74

Probably following the upgrade to zimscraperlib 5, it is not possible anymore to pass multiple languages as CSV: ``` Traceback (most recent call last): File "/usr/bin/zimit", line 8, in sys.exit(zimit.zimit())...

bug
regression

Currently, when favicon used as illustration is not proper size, we resize it. In fact this does not reduce the file size. We should call `optimize_png` (available in scraperlib) after...

enhancement
good first issue

On some occasions, we have recipes which takes a lot of time to process warc2zim. For instance https://farm.openzim.org/pipeline/466196d7-aa93-40cd-aec4-d8fb49294255: - browsertrix crawler started at 2025-03-03 20:52:24 - warc2zim started at 2025-03-14...

enhancement
question

See https://github.com/openzim/zim-requests/issues/1342

bug

In JS rewriting, we have a `replace_this_non_prop` rule to for instance transform: - `a = this;` into `a = _____WB$wombat$check$this$function_____(this)` - `return this.location` into `return _____WB$wombat$check$this$function_____(this).location` and so on. There...

bug

See https://farm.zimit.kiwix.org/pipeline/e0d6a925-1892-4306-a6cf-b71791d23e42/debug Why it is "famous" that some websites are giving improper encoding, it is weird to have "None" encoding. To be analyzed. Web page with the problem: https://www.highlandwoodworking.com/finishing/wood-finishing-color-triangle.html

bug

In some cases (e.g. https://github.com/openzim/zim-requests/issues/1162, but I'm pretty sure https://github.com/openzim/warc2zim/issues/402 would need the same), we need to patch website JS so that it does not interfere badly once inside the...

enhancement
question

Task: https://farm.zimit.kiwix.org/pipeline/d5d36f11-fdf0-4fa8-a078-99a46b2250aa/debug command: ``` zimit --url=https://istorija.haroldas.net --name=istorija.haroldas.net_d9cf9925 --zim-file=istorija.haroldas.net_d9cf9925.zim --userAgentSuffix=zimit.kiwix.org+ --sizeLimit=4294967296 --timeLimit=7200 --output=/output --statsFilename=/output/task_progress.json [email protected] --keep --publisher=openZIM ``` stdout: ``` [warc2zim::2024-12-29 23:24:49,350] ERROR:Problem encountered while processing https://istorija.haroldas.net/?zip=storage. Traceback (most recent call...

bug

https://farm.zimit.kiwix.org/pipeline/063787bf-02ba-4cee-9d62-4a024f883967/debug ``` ).rewrite(self.content_str) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 32, in rewrite File "/app/zimit/lib/python3.12/site-packages/zimscraperlib/rewriting/html.py", line 165, in rewrite self.close() File "/usr/lib/python3.12/html/parser.py", line 115, in close self.goahead(1) File "/usr/lib/python3.12/html/parser.py", line 179, in goahead...

bug

Task: https://farm.zimit.kiwix.org/pipeline/bb7f1afd-c1b3-4f26-bada-a5ea067cd6d4/debug Crawl was interrupted after 2 hours as expected. Only 390 pages have been crawled. However, I had to manually stop warc2zim because it was still processing after about...

bug
recipe