jhove icon indicating copy to clipboard operation
jhove copied to clipboard

Incorrect processing of XHTML if XML declaration is missing?

Open RvanVeenendaal opened this issue 11 months ago • 1 comments

In some of our XHTML 1.0 Transitional files the XML declaration is missing. As a result, JHOVE reports HTML-HUL-16 ("Unrecognized or missing DOCTYPE declaration; validation continuing as HTML 3.2"). If I manually add the XML declaration, the document is processed as XHTML (by the XML module) and JHOVE e.g. correctly finds an unclosed tag somewhere in the document.

According to the XHTML specifications, "An XML declaration is not required in all XML documents" (https://www.w3.org/TR/xhtml1/normative.html). For XHML 1.1 the XML declaration is also a 'SHOULD' have, not a 'MUST'. It seems that JHOVE expects that there always is an XML declaration.

Could this please be fixed, so that JHOVE correctly processes XHML files without an XML declaration?

Example of problem (edit to see all markup):

TITLE
TEXT

JHOVE 1.26.1 output (Dutch):

Documents C:\Temp\example.htm Module HTML-hul Release: 1.4.2 Date: 22-apr-2022 RepInfo URI: C:\Temp\example.htm LastModified: Mon Mar 04 15:45:26 CET 2024 Size: 534 Format: HTML Status: Not well-formed Messages ErrorMessage: Onherkend of ontbrekende DOCTYPE declaratie; validatie wordt verder gezet als HTML 3.2 ID: HTML-HUL-16 InfoMessage: This HTML version is currently not supported, falling back to HTML 3.2 ID: NO-ID ErrorMessage: Ongedefinieerd attribuut voor element ID: HTML-HUL-7 SubMessage: Name = html, Attribute = xmlns, Line = 2, Column = 7 ErrorMessage: De constructie met "/>" is onjuist, behalve in XHTML ID: NO-ID SubMessage: Name = meta, Line = 4, Column = 10 ErrorMessage: De constructie met "/>" is onjuist, behalve in XHTML ID: NO-ID SubMessage: Name = link, Line = 6, Column = 10 MimeType: text/html

Example with XML declaration added:

TITLE
TEXT

JHOVE 1.26.1 output (Dutch):

Documents C:\Temp\example_with_XML_declaration.htm Module XML-hul Release: 1.5.2 Date: 22-apr-2022 RepInfo URI: C:\Temp\example_with_XML_declaration.htm LastModified: Mon Mar 04 15:48:22 CET 2024 Size: 574 Format: XML Status: Not well-formed SignatureMatches XML-hul Messages ErrorMessage: SAXParseException ID: XML-HUL-1 SubMessage: The element type "link" must be terminated by the matching end-tag "". Line = 8, Column = 7. MimeType: text/xml

RvanVeenendaal avatar Mar 04 '24 14:03 RvanVeenendaal

Thanks for reporting this. We will try to reproduce the issue and get back to you if we have questions.

carlwilson avatar Mar 28 '24 14:03 carlwilson