v2
v2 copied to clipboard
[Feature Request] Fetch original content and not shown images...
Again, more a feature request:
The "Fetch original content" together with the scrapper is a life-saving tool. However, some pages uses lazy load functions to include images or place image loading into noscript-tags. This means the images are not shown in the reader.
Having a search-and-replace function (probably based on regex) would allow us to adjust the tags, so that images are included properly into the article.
Due to the "a-img" tag instead of a plain "img" tag, the image is not shown.
<figure class="article-image">
<a-img width="16" height="9" layout="responsive" src="/imgs/18/2/7/6/0/7/1/8/Hassrede-4b031a0dc42fc046.jpeg" alt="Hassrede: Renate Künast geht gegen Gerichtsbeschluss zu Beschimpfungen im Netz vor" quality="85" high-dpi-quality="50" instant="" style="height: 0; padding-top: 56.25%;"><img class="a-size-defined" alt="Hassrede: Renate Künast geht gegen Gerichtsbeschluss zu Beschimpfungen im Netz vor" src="https://heise.cloudimg.io/width/610/q85.png-lossy-85.webp-lossy-85.foil1/_www-heise-de_/imgs/18/2/7/6/0/7/1/8/Hassrede-4b031a0dc42fc046.jpeg" style="display: block;"><div class="a-sizer" style="padding-top: 56.25%; display: block;"></div></a-img>
<noscript>
<img
src="https://heise.cloudimg.io/width/200/q50.png-lossy-50.webp-lossy-50.foil1/_www-heise-de_/imgs/18/2/7/6/0/7/1/8/Hassrede-4b031a0dc42fc046.jpeg"
srcset="https://heise.cloudimg.io/width/200/q30.png-lossy-30.webp-lossy-30.foil1/_www-heise-de_/imgs/18/2/7/6/0/7/1/8/Hassrede-4b031a0dc42fc046.jpeg 2x"
alt="Hassrede: Renate Künast geht gegen Gerichtsbeschluss zu Beschimpfungen im Netz vor"class=""
style="width:100%;"
>
</noscript>
</figure>
Sounds like an addition to https://github.com/miniflux/miniflux/blob/b6f3160dbc3efe7a86d39d526a1780eb320eefd4/reader/rewrite/rewrite_functions.go#L79
Somehow related to this: Some sites use a static base64 encoded image in srcset
attribute to show a loading image, which leads to the browser not loading the actual image inside miniflux.
Can be seen "in the wild" here: https://bahnblogstelle.net/2021/04/04/hamburger-entwickelt-suchmaschine-fuer-nachtzugreisen/ (the big image below the heading)
I have managed to rewrite the srcset
use with the following:
replace("srcset"|"")
After adding this rewrite rule all the images are finally shown.
Thanks for the hint! That rewrite rule did the trick for me:
replace("<img "|"<ignore "),replace("a-img"|"img")
Thanks for the hint! That rewrite rule did the trick for me:
replace("<img "|"<ignore "),replace("a-img"|"img")
For me, this loads the full sized shutterstock images, which are sometimes as large as 10M.
I instead used use_noscript_figure_images
to use the smaller images of the noscript part. Moreover, removing .branding
and footer
lead to a very clean article.
My complete rewrite rules for heise:
use_noscript_figure_images,remove(".branding,footer")