benoit74

Results 370 issues of benoit74

We regularly face situation where one would like to edit one file stored in the ZIM. The typical situation I have in mind ATM is Zimit where we regularly have...

While ZIM metadata are supposed to be right from the beginning, we regularly face situations where the ZIM produced is ok in term of content, but we have few metadata...

I have setup a test page at https://website.test.openzim.org/form-get.html This is a simplified version of something we have encountered in the wild on two occasions. First on https://chopin.lib.uchicago.edu/. If you open...

We have a website where some Youtube videos are not crawled. See e.g. https://www.thejazzpianosite.com/jazz-piano-lessons/jazz-scales/melodic-minor-modes/ page. The first video `_UMejazqJqo` is perfectly crawled but not the second `tMdes5dOxt8` video. The HTML...

Since few weeks, publishing jobs for dev and release suddenly jumped from less than 10 minutes to more than 40 minutes ![Image](https://github.com/user-attachments/assets/2f6fcdee-1b33-4392-a61c-97e60a6e24e4) This is not normal and should at least...

bug

We have a website where some Youtube videos are not crawled. See e.g. https://www.thejazzpianosite.com/jazz-piano-lessons/jazz-scales/melodic-minor-modes/ page. The first video _UMejazqJqo is perfectly crawled but not the second tMdes5dOxt8 video. Reported upstream...

bug
upstream

Fixing https://github.com/openzim/zimit/issues/398 has been done by skipping the Youtube test. We need to reactivate this test at some point.

enhancement

In some cases, the ZIM might have all Youtube videos broken with a message "Sign in to confirm you’re not a bot. This helps protect our community". This is not...

bug

If for some resources the crawler encounters a ZIM file on a web property, we should immediately block it so that it is not included inside the WARC and then...

enhancement

This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially warc2zim in pyproject.toml and browsertrix crawler in...

task