warc2zim icon indicating copy to clipboard operation
warc2zim copied to clipboard

Retrieving upstream favicon to set illustration 48x48 is not smart enough

Open kelson42 opened this issue 1 year ago • 5 comments

Scraping https://womenshistory.si.edu/ with an extensive set of good favicon/illustrations:

<link rel="icon" sizes="16x16" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/favicon-16x16.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/favicon-16x16.png)" />
<link rel="icon" sizes="32x32" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/favicon-32x32.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/favicon-32x32.png)" />
<link rel="icon" sizes="96x96" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/favicon-96x96.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/favicon-96x96.png)" />
<link rel="icon" sizes="192x192" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/android-chrome-192x192.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/android-chrome-192x192.png)" />
<link rel="apple-touch-icon" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-60x60.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-60x60.png)" />
<link rel="apple-touch-icon" sizes="72x72" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-72x72.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-72x72.png)" />
<link rel="apple-touch-icon" sizes="76x76" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-76x76.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-76x76.png)" />
<link rel="apple-touch-icon" sizes="114x114" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-114x114.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-114x114.png)" />
<link rel="apple-touch-icon" sizes="120x120" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-120x120.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-120x120.png)" />
<link rel="apple-touch-icon" sizes="144x144" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-144x144.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-144x144.png)" />
<link rel="apple-touch-icon" sizes="152x152" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-152x152.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-152x152.png)" />
<link rel="apple-touch-icon" sizes="180x180" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-180x180.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-180x180.png)" />
<link rel="apple-touch-icon-precomposed" sizes="180x180" href="[https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-precomposed.png](view-source:https://womenshistory.si.edu//sites/default/themes/si_sawhm/favicons/apple-icon-precomposed.png)" />

... But warc2zim seems not able to find a good one image

I suspect it does not looks this new favicon entries

Might be related to https://github.com/openzim/warc2zim/issues/120

kelson42 avatar Jul 18 '24 08:07 kelson42

.. but maybe here we face a bigger scraping problem image

kelson42 avatar Jul 18 '24 09:07 kelson42

This looks like a crawling issue, due to something which detected the crawler and prevented it from operating. Nothing we can fix at code level.

benoit74 avatar Jul 18 '24 09:07 benoit74

Looks like using "Pixel 5" as mobile device is not triggering the "protection".

benoit74 avatar Jul 18 '24 11:07 benoit74

I believe the favicon taken as illustration is not in a high resolution enough.

kelson42 avatar Jul 18 '24 12:07 kelson42

I confirm that scraper takes first "icon" and should take into consideration the "sizes" provided to select the 48x48 one or the biggest one (so that we resize from highest res possible, to avoid side effects from downsizing from something too close to 48x48 and having to resample fractions of pixels).

benoit74 avatar Jul 18 '24 12:07 benoit74