python-scraperlib Should our image tools support SVG?

Our image functions all assume bitmap input or output. Should we support vector formats as well (SVG's the only open and widespread one I think)? If so, should it be using different API or should this be handled transparently?

Oct 09 '23 10:10 rgaudin

Browsers support SVG 1.1: https://caniuse.com/svg

SVG 2 does not yet seem to be supported by browsers: https://svgwg.org/svg2-draft/

Never easy to align with "all" browsers... but if attainable... Awesome!

Oct 09 '23 10:10 holta

Would nice indeed, but maybe we can wait the first use case to do it.

Oct 13 '23 16:10 kelson42

It is quite common with zimit website to have a favicon in SVG format. This fails to be processed currently.

May 17 '24 16:05 benoit74

We should go to have a smooth conversion to the expected ZIM Illustration PNG format. For the rest, like written by @holta, we have meanwhile a broad SVG format and we should not force a bitmap conversion.

May 18 '24 13:05 kelson42

I propose to use following test set to choose the SVG library:

https://solar.lowtechmagazine.com/icons/sun.svg
https://docs.python.org/3/_static/py.svg
https://www.gstatic.com/images/icons/material/apps/fonts/1x/catalog/v5/favicon.svg
https://www.solidarite-numerique.fr/wp-content/themes/snum-v2/images//ico/favicon.svg
https://summerofcode.withgoogle.com/assets/favicons/safari-pinned-tab.svg

Jul 09 '24 06:07 benoit74

It looks like CairoSVG is the most popular lib to use. But its support is very questionable/complex: https://github.com/Kozea/CairoSVG/issues/298 ; main supporters even dropped it from their software due to limitations / too complex support: https://www.courtbouillon.org/blog/00009-weasyprint-53-what-s-new/#svg-rendering.

An alternative could be svglib but it also needs Reportlab toolkit, support seems limited, CI is failing, Python 3.11 and 3.12 are not even mentioned.

Another alternative could be librvsg and its Python bindings. Advantage is that librsvg is a Gnome project and seems to still be supported. But the Python bindings clearly lacks documentation, I struggle to find how to configure it properly.

Final alternative is an external tool like Inkscape or ImageMagick. Both are well supported and it is their "core purpose". I would tend to prefer ImageMagick for its versatility + the fact that it is a pure command-line tool. Inkscape has a CLI interface but it looks like it does not provide CLI-only executables, you have to install the GUI and all its dependencies.

So my favorite was to consider relying on ImageMagick for SVG to PNG conversion.

However, tests with real svgs on the test set have shown to be a disaster with ImageMagick (python failed mostly completely, gfonts is extremely blurry, other are blurry as well). I tried the second contender on my list (cairosvg) and it worked very smoothly.

Command used in ImageMagick (might be the problem; density has been added to try enhance the result but did not changed anything):

convert -background none xxx_raw.svg -density 5000 -resize 48x48 xxx_im.png

Command used in cairo:

import cairosvg
cairosvg.svg2png(url="https://solar.lowtechmagazine.com/icons/sun.svg", write_to="/data/solar_cairo.png", output_width=48, output_height=48)

I also confirmed that cairosvg is not distorting the image if original image is not squared (should probably never happen, but who knows ...).

Finally, becoming a sponsor of CourtBouillon team with is supporting cairosvg but also the tinycss2 we are already using is probably something to consider

I will create a PR with cairosvg.

Jul 09 '24 08:07 benoit74

My experience (mostly with Mediawiki) is with librsvg. It is already a few years old, but work fine.

Jul 10 '24 07:07 kelson42

Just changed the title to confirm that this issue will focus on conversion (todo) + probing (probably already working with PIL). SVG optimization is already tracked in https://github.com/openzim/python-scraperlib/issues/80 and will need a different tooling.

Jul 10 '24 10:07 benoit74