zim-requests icon indicating copy to clipboard operation
zim-requests copied to clipboard

der-postillon.com recipe is not working anymore

Open benoit74 opened this issue 1 year ago • 7 comments

Recipe URL

https://farm.openzim.org/recipes/der-postillon.com

What is wrong

Why is the recipe disabled while we have (a) ZIM in production? Should we reenable the recipe to update the ZIM to use Zimit 2?

Why do we have two files, including one improperly labelled as fr: https://library.kiwix.org/#lang=&q=postillon

benoit74 avatar Jun 13 '24 14:06 benoit74

Wasn't Der Postillon deprecated as a website?

How can the "French" site be 178GB??

Popolechien avatar Jun 13 '24 14:06 Popolechien

There is a french site? Looks like this is just bad metadata, isn't it?

benoit74 avatar Jun 13 '24 15:06 benoit74

Yeah, hence the quotes - I meant to point at der-postillon.com_fr_all_2022-01 : how can we 20x the file size in the span of a month?

Popolechien avatar Jun 17 '24 15:06 Popolechien

Not one month, two years. But I don't know, one has to dig inside the ZIM content to find the source of this huge difference.

benoit74 avatar Jun 17 '24 19:06 benoit74

I requested the recipe with Zimit2, let's see

benoit74 avatar Jun 21 '24 07:06 benoit74

New ZIM produced with Zimit2 : https://dev.library.kiwix.org/viewer#der-postillon.com_de_all_2024-06

I found at least two issues:

  • one needs to develop custom CSS to hide all banners and ads which blocks the way ; some white space is also added for nothing, should be hidden as well
  • there is an issue with the images on the homepage which probably deserve a fuzzy rule

Size is however "only" 5.5G, so I've deleted the fr variant from prod which was 100+GB, this was just garbage

benoit74 avatar Jun 24 '24 08:06 benoit74

See https://github.com/openzim/warc2zim/issues/330#issuecomment-2235328411, a browser profile is also needed to remove the cookie popup and properly scroll and fetch all images

benoit74 avatar Jul 18 '24 04:07 benoit74