der-postillon.com recipe is not working anymore
Recipe URL
https://farm.openzim.org/recipes/der-postillon.com
What is wrong
Why is the recipe disabled while we have (a) ZIM in production? Should we reenable the recipe to update the ZIM to use Zimit 2?
Why do we have two files, including one improperly labelled as fr: https://library.kiwix.org/#lang=&q=postillon
Wasn't Der Postillon deprecated as a website?
How can the "French" site be 178GB??
There is a french site? Looks like this is just bad metadata, isn't it?
Yeah, hence the quotes - I meant to point at der-postillon.com_fr_all_2022-01 : how can we 20x the file size in the span of a month?
Not one month, two years. But I don't know, one has to dig inside the ZIM content to find the source of this huge difference.
I requested the recipe with Zimit2, let's see
New ZIM produced with Zimit2 : https://dev.library.kiwix.org/viewer#der-postillon.com_de_all_2024-06
I found at least two issues:
- one needs to develop custom CSS to hide all banners and ads which blocks the way ; some white space is also added for nothing, should be hidden as well
- there is an issue with the images on the homepage which probably deserve a fuzzy rule
Size is however "only" 5.5G, so I've deleted the fr variant from prod which was 100+GB, this was just garbage
See https://github.com/openzim/warc2zim/issues/330#issuecomment-2235328411, a browser profile is also needed to remove the cookie popup and properly scroll and fetch all images