benoit74
benoit74
Given the commit messages, I'm not sure the fix is that easy, looks like we already tried to do it correctly but missed some edge cases 🤣
I concur with this analysis
I feel like decision to go back to libzim6 behavior, with fallback to libzim 7 is deemed to fail for mostly all scrapers but mwoffliner. You mentioned `sotoki` and `nautilus`,...
There is a mediawiki at https://oeis.org/wiki/ We cannot scrape mediawiki websites with zimit unless very special configuration is put in place (and this is not even recommended). There is 49...
Note: task [e130bc44-0dfc-4901-ba92-1cf894731d05](https://farm.openzim.org/pipeline/e130bc44-0dfc-4901-ba92-1cf894731d05) is marked as succeeded, but in fact the crawler crashed with "Browser disconnected (crashed?), interrupting crawl" message, the ZIM is not usable.
I've disabled the recipe which was still running but was wrong and not working. @tdeitch can you please explain what is interesting you to ZIM on the website? Is it...
@tdeitch no worries, and thanks a lot for the clarification. I expected your answer but didn't wanted to bias the request based on my own biases ^^ Due to the...
Nota: I've deleted https://farm.openzim.org/recipes/oeis.org_en_all since it made no sense
One sample case (Youtube thumbnails/placeholders images for videos in embedded player) where current fuzzy rules system is insufficient: https://github.com/openzim/warc2zim/issues/262#issuecomment-2124084341