mwoffliner
mwoffliner copied to clipboard
Detect image size from HTML attribute instead of URL regex and recompress
Following https://phabricator.wikimedia.org/T360589
Wikimedia will start to serve images only at fixed width to save storage on their cache.
For instance, if the Wikitext requests the image thumbnail at 900px, in fact the image served will be at 1000px (because this is the smaller largest resolution available among the fixed set of resolutions they have chosen).
The consequence for us is that the image URL will contain this 1000px information, and we will hence retrieve and store image at this resolution, while in fact we could know that it is not used at any resolution larger than 900px.
Do we want to modify the scraper to use this 900px information and automatically resize the image retrieved from the mediawiki ?
Edit: we could know that it is not used at any resolution larger than 900px, and we previously retrieved this image at 900px
Yes, I believe this is very desirable and necessary. I know there has been contention about image sizing, but assuming we have a reliable way of computing what size an image should be, we should resize to that size in mwoffliner.
Do we want to modify the scraper to use this 900px information and automatically resize the image retrieved from the mediawiki ?
I guess we have no alternative? this would be an extension of the media optimisation feature set?
Looks so