benoit74

Results 604 issues of benoit74

We have a task on farm.zimit.kiwix.org which requests more memory than mwoffliner workers have available: ![Image](https://github.com/user-attachments/assets/7e685e20-7768-47f6-9b43-a01189a4d255) ![Image](https://github.com/user-attachments/assets/bb17771b-b740-4a00-acd1-dc191f28b531) I removed the task on the farm. Is it an issue to fix...

bug
question

Fix documentation, `docker-compose` command does not exists anymore now that compose is part of docker, one has to use `docker compose`

In https://github.com/openzim/wp1/blob/main/docker-compose.yml (prod configuration), logs of the reverse-proxy and letsencrypt containers are set to `logging.driver: 'none'`, i.e. we immediately drop the logs. During https://github.com/kiwix/operations/issues/211, this prevented me to diagnose an...

question

See https://github.com/openzim/zim-requests/issues/272#issuecomment-2466938768 Since files are hosted on upload.wikimedia.org, we must comply with their User-Agent policy at https://meta.wikimedia.org/wiki/User-Agent_policy I suggest we add a CLI option to pass a custom User-Agent to...

enhancement
good first issue

I had to skip this test which is failing for now with most recent libzim: https://github.com/openzim/python-scraperlib/blob/fef63f81fdb9dd6d2a5e17d9c8785e3fd22665e9/tests/zim/test_indexing.py#L114-L144 This looks like an upstream issue, hopefully only at read time: https://github.com/openzim/libzim/issues/981

upstream
regression

This issue serves as a checklist for the release event. - [ ] Secure the CI is green on git `main` - [ ] Check that dependencies ranges are ok,...

task

Ruff / Flake8 has a new rule `A005`: https://docs.astral.sh/ruff/rules/stdlib-module-shadowing/ It is recommended to not shadow Python standard-library modules. Currently, we have 5 issues: ``` src/zimscraperlib/html.py:1:1: A005 Module `html` shadows a...

enhancement

For files hosted on upload.wikimedia.org, we must comply with their User-Agent policy at https://meta.wikimedia.org/wiki/User-Agent_policy Doing so at scraperlib level in `stream_file` (main methods using in many scraper to download files...

enhancement
question

This PR enrich the scraperlib with a `ScraperExecutor`. This class is capable to process tasks in parallel, with a given number of worker **threads**. This executor is mainly inspired from...

**Describe the bug** I have a test ZIM with an `iframe` with has an `srcdoc` attribute instead of the classical `src` (i.e. code is "inline"). The iframe stays blank. This...

bug