benoit74
benoit74
See https://github.com/webrecorder/browsertrix-crawler/issues/627 for issue on https://www.playmobil.com/fr-fr/tiny-house/71509.html?gad_source=1&gclid=CjwKCAjwuJ2xBhA3EiwAMVjkVK41oNKfKsuOcp6oXd4I1lLYXhgnB4PE3Yg8zSBMPb7jHvZEZbMdBRoCizIQAvD_BwE seed page.
We need to automate the testing and alarming of important Zimit functions, to ensure to be aware of broken things before asap. Ideally we would like to run these tests...
This issue serves as a checklist for the release event. - [ ] Check that dependencies have been updated to latest version (especially warc2zim in pyproject.toml and browsertrix crawler in...
See https://github.com/openzim/zim-requests/issues/277
I suggest to merge zimit and warc2zim repositories (into zimit), because these two projects are so interleaved that having 2 projects has too important downsides: - it is complex to...
Currently, we do not use a released package of warc2zim but we build it from source since the dependency is `warc2zim @ git+https://github.com/openzim/warc2zim@main` The drawback is that for now warc2zim...
We have three limits which can stop the crawler in the middle of a run: - `--sizeLimit`: the maximum warc size - `--timeLimit`: the maximum duration of the crawl -...
Add basic unit testing structure so that subsequent changes will be easier to test: - add classes hiding (a bit) how the crawler and warc2zim are called - mock these...
See https://github.com/webrecorder/browsertrix-crawler/issues/217
Usage section of README.md is quite confusing: - it mention that it is possible to run `warc2zim --help` while this is both not working (you need to be inside the...