benoit74

Results 370 issues of benoit74

See https://github.com/webrecorder/browsertrix-crawler/issues/630

upstream

See https://github.com/openzim/zim-requests/issues/1059

bug
recipe

In https://github.com/openzim/warc2zim/pull/306, we change the way we detected the type of content and introduced a warning intended to help to diagnose potential issues with this significant change. Once this has...

enhancement

Currently, non-GET (POST, PUT, ...) requests returning an HTML document are supposed to work but they are not tested at all. It is supposed to work based on what has...

enhancement

Currently, JSONP support is not tested at all. It is supposed to work based on what has been transferred from wabac.js, but not tested. We need to : - create...

enhancement

Do we want to raise a warning in the logs (or fail the scraper?) when we have two WARC records leading to the same ZIM Path, most probably due to...

enhancement
question

Currently, warc2zim is very permissive regarding issues that may arise while rewriting documents. This is mostly mandatory due to - the nature of website encountered in the wild which are...

enhancement
question

Fix #370 Changes: - use a generic class to automatically compute function signature at rule initialization - use cached value at "runtime"

For a very small WARC like https://github.com/openzim/warc2zim/blob/main/tests/data-special/qsl.net-encoding-alias.warc.gz, it takes more than 2 minutes to build the ZIM. A flamegraph shows that most of the time is spent in the `rewrite_html`...

bug

See https://data.fs.usda.gov/geodata/rastergateway/states-regions/states.php ``` ``` Seen on https://farm.zimit.kiwix.org/pipeline/f1a1f927-a785-4f8f-b0c6-c7d69e75ed14/debug ``` Traceback (most recent call last): File "/usr/bin/zimit", line 8, in sys.exit(zimit.zimit()) ^^^^^^^^^^^^^ File "/app/zimit/lib/python3.12/site-packages/zimit/zimit.py", line 585, in zimit run(sys.argv[1:]) File "/app/zimit/lib/python3.12/site-packages/zimit/zimit.py", line...

bug