zim-tools icon indicating copy to clipboard operation
zim-tools copied to clipboard

zimcheck wrongly warns about external links which is plain text

Open benoit74 opened this issue 2 weeks ago • 3 comments

Sample HTML:

<p>&lt;img src="http://acme.com"&gt;</p>

Zimcheck result:

> zimcheck ./tests_en_zimcheck-issue_2025-12.zim
[INFO] Checking zim file ./tests_en_zimcheck-issue_2025-12.zim
[INFO] Zimcheck version is 3.6.0
[INFO] Verifying ZIM-archive structure integrity...
[INFO] Avoiding redundant checksum test (already performed by the integrity check).
[INFO] Checking metadata...
[INFO] Searching for Favicon...
[INFO] Searching for main page...
[INFO] Verifying Articles' content...
[INFO] Searching for redundant articles...
  Verifying Similar Articles for redundancies...
[INFO] Checking for redirect loops...
[ERROR] Invalid external links found:
  http://acme.com is an external dependence in article home
[INFO] Overall Test Status: Fail
[INFO] Total time taken by zimcheck: <3 seconds.

Expected result : PASS

Nota: This impact <img> but not <a>

Zimcheck used:

> zimcheck --version
zim-tools 3.6.0

libzim 9.3.0
+ libzstd 1.5.5
+ liblzma 5.2.6
+ libxapian 1.4.23
+ libicu 73.2.0

benoit74 avatar Dec 11 '25 09:12 benoit74

As long as we won't do a proper HTML parsing, we will suffer of these kind of problems IMHO.

kelson42 avatar Dec 11 '25 09:12 kelson42

Validating HTML without a real HTML parser is indeed prone to fail.

benoit74 avatar Dec 11 '25 12:12 benoit74