zimit icon indicating copy to clipboard operation
zimit copied to clipboard

Automatically ignore ZIM resources found on a website to crawl

Open benoit74 opened this issue 5 months ago • 1 comments

If for some resources the crawler encounters a ZIM file on a web property, we should immediately block it so that it is not included inside the WARC and then inside the ZIM.

This is probably a page block rule to be implemented in browsertrix crawler.

I don't think that we need a switch to disable the blockage, I don't see a scenario where it would make sense to ZIM a ZIM inside a ZIM ^^

benoit74 avatar Sep 19 '24 12:09 benoit74