mwoffliner icon indicating copy to clipboard operation
mwoffliner copied to clipboard

Detect image size from HTML attribute instead of URL regex and recompress

Open benoit74 opened this issue 8 months ago • 4 comments

Following https://phabricator.wikimedia.org/T360589

Wikimedia will start to serve images only at fixed width to save storage on their cache.

For instance, if the Wikitext requests the image thumbnail at 900px, in fact the image served will be at 1000px (because this is the smaller largest resolution available among the fixed set of resolutions they have chosen).

The consequence for us is that the image URL will contain this 1000px information, and we will hence retrieve and store image at this resolution, while in fact we could know that it is not used at any resolution larger than 900px.

Do we want to modify the scraper to use this 900px information and automatically resize the image retrieved from the mediawiki ?

benoit74 avatar Mar 21 '25 08:03 benoit74

Edit: we could know that it is not used at any resolution larger than 900px, and we previously retrieved this image at 900px

benoit74 avatar Mar 21 '25 08:03 benoit74

Yes, I believe this is very desirable and necessary. I know there has been contention about image sizing, but assuming we have a reliable way of computing what size an image should be, we should resize to that size in mwoffliner.

audiodude avatar Mar 21 '25 16:03 audiodude

Do we want to modify the scraper to use this 900px information and automatically resize the image retrieved from the mediawiki ?

I guess we have no alternative? this would be an extension of the media optimisation feature set?

kelson42 avatar Apr 02 '25 18:04 kelson42

Looks so

benoit74 avatar Apr 03 '25 12:04 benoit74