python-scraperlib icon indicating copy to clipboard operation
python-scraperlib copied to clipboard

Comprehensive benchmark of image presets

Open rgaudin opened this issue 5 years ago • 7 comments

Testing Webp support on youtube showed that WebpHigh doesn't produce high quality thumbnail. As these image presets are going to be used everywhere, it's important that, now that the rest works and before we actually roll it out everywhere, we run a benchmark of all presets.

What I envision is a table of a small (10?) list of images used in our scrapers for each format, with, side-by-side: the original, and the three presets version.

That should help us validate or revise the presets variable. It would also serve as a reference in the future, when we have to choose what preset to use for a scraper.

http://tmp.kiwix.org/youtube/report.html can serve as an inspiration

rgaudin avatar Sep 15 '20 09:09 rgaudin

Note to self: {"lossless": False, "quality": 100, "method": 4} works well for a good preset (stole that from imagemagick's defaults). We may want to add a lossless preset? As already discussed, in many scenarios, lossless increases file size (especially on youtube's webp which are probably well optimized already)

rgaudin avatar Sep 15 '20 10:09 rgaudin

I did a small script to check and prepared a set of images of different sizes/types. I took 10 images of each of the different types of formats. Here are all the images that I used - https://drive.google.com/file/d/1KJF1wJvsWMxWK0PNcLPsMzuhvuGrkzyB/view?usp=sharing

Also, here's the output (it contains all images and a report.html file which can be viewed for the table representation) - https://drive.google.com/file/d/1O0LV_mUl6MF8Eenmkbap92XDhpSuRlpQ/view?usp=sharing

Here's the python script that I used - https://gist.github.com/satyamtg/31975ae1400e61633f0fbadd6f042c0c

The main thing that I see is for JPEG and PNG presets, the medium and low ones give images of the same size, which may be due to the fact that we have 256 color image (for PNG) in both, probably due to the requirement of default values by optimize_images. For JPEG, we might need to investigate the case.

The results can be viewed here directly - http://tmp.kiwix.org/imgbench/report.html or http://tmp.kiwix.org/imgbench/report-small.html

satyamtg avatar Sep 15 '20 13:09 satyamtg

Thank you for this.

As you wrote, on JPEG, low and medium are exactly the same. It's a problem. PNG low/medium are also similar. I noticed the limited colors on medium as well (on the molecular thing) which I think we don't want on medium. We could change that. There's a chance that we get similar to High results though.

So I checked in detail and figured that for the quality param to be used, you need to enable fast_mode

Please fix that for JPEG and don't reduce colors on PNG medium ; and also change the WebpHigh to what I proposed above.

rgaudin avatar Sep 21 '20 15:09 rgaudin

I changed the presets as you mentioned in #67. However, the WebP preset is a bit different as the one you proposed didn't yield a smaller file size on any of the images I tested with. So, I've used {quality: 90, mode: 6, lossless: False}. Mode 6 uses stronger compression. Here's the result with what I used - https://drive.google.com/file/d/1jXNulMPfD1P3SMdd0VdVyll3XDeBAB_Z/view?usp=sharing Here's the result with what you proposed earlier - https://drive.google.com/file/d/1qXTSM6HqhiAUyBFyvPqeapYOPM9hB-c2/view?usp=sharing I think that there's decrease in file size without significant loss in quality. What are your thoughts @rgaudin ?

satyamtg avatar Sep 21 '20 18:09 satyamtg

OK, tested it on what triggered my attention and it's not noticeable. Nice work!

rgaudin avatar Sep 22 '20 08:09 rgaudin

Reopening after discussions with BSF as we may use their knowledge to redo a benchmark and choose a better quality/compression ratio

rgaudin avatar Jun 01 '22 08:06 rgaudin

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Aug 13 '22 10:08 stale[bot]