zim-tools
zim-tools copied to clipboard
zimcheck wrongly warns about external links which is plain text
Sample HTML:
<p><img src="http://acme.com"></p>
Zimcheck result:
> zimcheck ./tests_en_zimcheck-issue_2025-12.zim
[INFO] Checking zim file ./tests_en_zimcheck-issue_2025-12.zim
[INFO] Zimcheck version is 3.6.0
[INFO] Verifying ZIM-archive structure integrity...
[INFO] Avoiding redundant checksum test (already performed by the integrity check).
[INFO] Checking metadata...
[INFO] Searching for Favicon...
[INFO] Searching for main page...
[INFO] Verifying Articles' content...
[INFO] Searching for redundant articles...
Verifying Similar Articles for redundancies...
[INFO] Checking for redirect loops...
[ERROR] Invalid external links found:
http://acme.com is an external dependence in article home
[INFO] Overall Test Status: Fail
[INFO] Total time taken by zimcheck: <3 seconds.
Expected result : PASS
Nota: This impact <img> but not <a>
Zimcheck used:
> zimcheck --version
zim-tools 3.6.0
libzim 9.3.0
+ libzstd 1.5.5
+ liblzma 5.2.6
+ libxapian 1.4.23
+ libicu 73.2.0
As long as we won't do a proper HTML parsing, we will suffer of these kind of problems IMHO.
Validating HTML without a real HTML parser is indeed prone to fail.