python-scraperlib icon indicating copy to clipboard operation
python-scraperlib copied to clipboard

Collection of Python code to re-use across Python-based scrapers

Results 52 python-scraperlib issues
Sort by recently updated
recently updated
newest added

Currently, we rely on various objects in scraperlib to: - create the ZIM - re-encode videos and images - cache these assets on the optimization cache We might consider to...

enhancement
question

As discussed in https://github.com/openzim/sotoki/pull/162#issuecomment-660452579, it actually seems a bit odd to handle duplicate files in the scrapers. We can instead have a system to redirect have a single copy of...

enhancement

[PEP585](https://peps.python.org/pep-0585/) introduced support for the generics syntax in all standard collections currently available in the `typing` module. At the same time, it deprecated the use of `typing` for all these...

enhancement
good first issue

It offers a better compression

enhancement
question

Testing Webp support on youtube showed that `WebpHigh` doesn't produce *high* quality thumbnail. As these image presets are going to be used everywhere, it's important that, now that the rest...

question
stale

As we're seeing SVG as source images in scraper source, it'd good to have an SVG optimizer/cleaner. Probably lossless only at this point. - [svgo](https://github.com/svg/svgo) probably most popular (node) -...

enhancement
stale

We use kiwix_storagelib for implementing S3 based optimization cache in the scrapers. However, this gives rise to redundant code. We put a version of the file along with the optimizer...

enhancement
stale

We have a helper delete_callback at https://github.com/openzim/python-scraperlib/blob/335d5271e106b374f1aca871d19557ff2c81582d/src/zimscraperlib/filesystem.py#L47 This delete_callback is meant to be used as a callback when adding an item to the ZIM, typically to delete original file once...

enhancement

Since 4.0.0, it looks like automatic indexing of PDFs has made the scraperlib significantly slower to process items. It is probably linked to the fact that with current 4.0.0 implementation...

bug

The README should mention that libcairo is mandatory for SVG operations.

bug