benoit74
benoit74
Many tasks are running forever on Zimfarm while they were expected to complete quite fast (based on usual timings from previous runs). I had a look at mwoffliner2 and the...
Since Python 3.12, we have the following DeprecationWarning: ``` warcio/recordbuilder.py:156: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC:...
This has been discovered in https://github.com/openzim/libzim/issues/865#issuecomment-2003154791 We have a test ZIM at https://tmp.kiwix.org/ci/zim_characters_encoding/characters_encoding.zim The last link (the one with all emojis) is leading to a "text/plain" document with UTF-8 encoded...
Fix #392 (mostly, see NB below, but this is ok for me) # Changes - add a `failedLimit` CLI argument which interrupts the crawler if the number of failed pages...
Kiwix has a crawler which got stuck without returning, with 0.11.1 (i.e. with #385 merged). A last log is output and then process is still up but nothing more seems...
Basically, when running the crawler with official 0.12.2 Docker image on https://kiwix.org/fr/, the Youtube video on the home page is not in the WARCs: ``` docker run --rm -it -v...
README.md still mention Chrome while since 0.12, crawler has switched to Brave. I think this should be fixed. Probably mention only mention Brave in first paragraph then add a detail...
The crawler should behave more appropriately when it is encountering `HTTP 429 - Too Many Requests` errors. Below is an example log where the website requested the scraper to slow-down...
We have to fix the situation where Youtube videos are not working everywhere. We typically now that they do not play in kiwix-serve on Android Firefox / Chrome (while they...
Browsertrix crawler: version 1.0.0-beta.6 This occured on Zimit 2 but might have no link with it, since it could be either a crawler problem or a Docker / Zimfarm issue....