warc2zim
warc2zim copied to clipboard
Raise warnings when there is a conflict of http/https and/or ports and/or ...
Do we want to raise a warning in the logs (or fail the scraper?) when we have two WARC records leading to the same ZIM Path, most probably due to a conflict of http/https URLs ?
Would be great if we can ensure the warning is displayed only when the resource is really different, but it is made hard by HTTP redirections.
Not sure it is really worth it (at least we have lots a debug message ""Skipping duplicate {url}, already added to ZIM", so this has to be analyzed in details.