jhove
jhove copied to clipboard
Incorrect processing of XHTML if XML declaration is missing?
In some of our XHTML 1.0 Transitional files the XML declaration is missing. As a result, JHOVE reports HTML-HUL-16 ("Unrecognized or missing DOCTYPE declaration; validation continuing as HTML 3.2"). If I manually add the XML declaration, the document is processed as XHTML (by the XML module) and JHOVE e.g. correctly finds an unclosed tag somewhere in the document.
According to the XHTML specifications, "An XML declaration is not required in all XML documents" (https://www.w3.org/TR/xhtml1/normative.html). For XHML 1.1 the XML declaration is also a 'SHOULD' have, not a 'MUST'. It seems that JHOVE expects that there always is an XML declaration.
Could this please be fixed, so that JHOVE correctly processes XHML files without an XML declaration?
Example of problem (edit to see all markup):
JHOVE 1.26.1 output (Dutch):
Documents C:\Temp\example.htm Module HTML-hul Release: 1.4.2 Date: 22-apr-2022 RepInfo URI: C:\Temp\example.htm LastModified: Mon Mar 04 15:45:26 CET 2024 Size: 534 Format: HTML Status: Not well-formed Messages ErrorMessage: Onherkend of ontbrekende DOCTYPE declaratie; validatie wordt verder gezet als HTML 3.2 ID: HTML-HUL-16 InfoMessage: This HTML version is currently not supported, falling back to HTML 3.2 ID: NO-ID ErrorMessage: Ongedefinieerd attribuut voor element ID: HTML-HUL-7 SubMessage: Name = html, Attribute = xmlns, Line = 2, Column = 7 ErrorMessage: De constructie met "/>" is onjuist, behalve in XHTML ID: NO-ID SubMessage: Name = meta, Line = 4, Column = 10 ErrorMessage: De constructie met "/>" is onjuist, behalve in XHTML ID: NO-ID SubMessage: Name = link, Line = 6, Column = 10 MimeType: text/html
Example with XML declaration added:
JHOVE 1.26.1 output (Dutch):
Documents C:\Temp\example_with_XML_declaration.htm Module XML-hul Release: 1.5.2 Date: 22-apr-2022 RepInfo URI: C:\Temp\example_with_XML_declaration.htm LastModified: Mon Mar 04 15:48:22 CET 2024 Size: 574 Format: XML Status: Not well-formed SignatureMatches XML-hul Messages ErrorMessage: SAXParseException ID: XML-HUL-1 SubMessage: The element type "link" must be terminated by the matching end-tag "". Line = 8, Column = 7. MimeType: text/xml
Thanks for reporting this. We will try to reproduce the issue and get back to you if we have questions.