python-scraperlib
python-scraperlib copied to clipboard
Collection of Python code to re-use across Python-based scrapers
This issue serves as a checklist for the release event. - [ ] Secure the CI is green on git `main` - [ ] Check that dependencies ranges are ok,...
#134 has removed the i18n translation support in python-scraperlib because it was relying too much on system dependencies (not working on Windows for instance) and API was not mature enough...
Here's a first shot at an implementation of zimwriterfs using scraperlib. It uses the same interface except for two missing features: - `--inflateHtml`: now sure it's useful at all -...
Just running again CI on `main` branch to have a look.
In `validate_zimfile_creatable`, nothing is specifically tied to the fact that we are manipulating a ZIM file. The method should hence be renamed `validate_file_creatable`. Obviously this would be a breaking change,...
`callback` argument of `zim.creater.Creator.add_item_for` is reported by pyright as partially unknown when in `strict` mode due to the use of generic `Callable`. This is a problem in projects setting `pyright`...
When recently building the DevDocs scraper, I realized there are a ton of things that I was relying on @benoit74's expertise for to make the scraper sustainable for ZimFarm but...
We are publishing more and more ZIM files with videos using many different scrapers. Do do that we mainly: * Re-encode videos/audio streams * Handle sub-titles * Display all of...
Note sure what to do about it but the `image` module depends on a [`ffmpeg`](https://packages.debian.org/bookworm/ffmpeg) dependency of dependency (`libcairo` via `libavcodec`). This means that someone not planning on using any...