Book HTML images are not always sufficiently compressed
In order to solve https://github.com/openzim/gutenberg/issues/288, we've decided to stop compressing (optimizing) images on our own, especially images used in HTML books.
Recent runs and analysis done in https://github.com/openzim/gutenberg/issues/374 proved that optimization we were doing on images was not that useless.
Two examples below:
| 2023-08 | 2025-10 | |
|---|---|---|
| Size | 63k | 101k |
| Preview | ||
| Online at | https://dev.library.kiwix.org/content/gutenberg_de_all_2023-08/53217_fig1.jpg | https://browse.library.kiwix.org/content/gutenberg_de_all_2025-10/53217_fig1.jpg |
| 2023-08 | 2025-10 | |
|---|---|---|
| Size | 30k | 63k |
| Preview | ||
| Online at | https://dev.library.kiwix.org/content/gutenberg_de_all_2023-08/52492_abb12.jpg | https://browse.library.kiwix.org/content/gutenberg_de_all_2025-10/52492_abb12.jpg |
While we could see some visual artifacts induced by higher compression, the difference in file size is clearly at the advantage of higher compression.
We should probably:
- put compression of images back in place
- confirm expected size difference (should save about 2.11G on Gutenberg DE)
- decide how to handle optimization cache invalidation (see https://github.com/openzim/gutenberg/issues/288)
- put optimization cache back in place
Or is this kind of image increased optimization something to do on Gutenberg side directly? @eshellman can you advise? I imagine compression is less important on Gutenberg side than on Kiwix one, but still this is not negligible.
PG doesn't do much compression. I would bet that only compressing the largest files could get you 80% of the total size reduction benefit with zero quality degradation on current displays.
OK, that contradicts what I've understood from https://github.com/openzim/gutenberg/issues/288#issuecomment-3339000032 where you said "I'm not sure what optimizing you can do". Do you confirm I misinterpreted you (hoping you've said what I wanted to hear, classical bias)?
https://github.com/gutenbergtools/ebookmaker/blob/fa2e91f4bf1ab75674919841c6c36a489596530b/src/ebookmaker/writers/EpubWriter.py#L50 sets the max image size to 256K (1M for linked images); so images smaller than that get no compression other than what zip does. What I was talking about in #288 was that, yes the ebook files are rebuilt every month (and have been since 2017?)