jhove icon indicating copy to clipboard operation
jhove copied to clipboard

XML incorrectly not-well-formed because of http in Link to Schema

Open Bodensuri opened this issue 11 months ago • 1 comments

We are ingesting many XML files that are classified by JHOVE as "not well-formed" although they are well-formed. Here is an example: 12745764.zip These XML files were created by Abbyy Finereader. The contain an http link to a Schema. If "http" is changed into "https", the file becomes well-formed. Since the XML Version is not declared on top of the file, it is an XML 1.0. XML 1.0 does not require a Schema. If the schema location was wrong, it would perhaps invalid, but still well-formed.

JhoveView (Rel. 1.28.0, 2023-05-18) Date: 2024-03-25 18:58:12 MEZ RepresentationInformation: C:\Users\rsuri\Downloads\12745764.xml ReportingModule: XML-hul, Rel. 1.5.3 (2023-03-16) LastModified: 2024-03-25 18:40:00 MEZ Size: 829103 Format: XML Status: Not well-formed SignatureMatches: XML-hul ErrorMessage: SAXParseException: Premature end of file. Line = -1, Column = -1. ID: XML-HUL-1 MIMEtype: text/xml

Bodensuri avatar Mar 25 '24 18:03 Bodensuri

Thanks for reporting this. We will try to reproduce the issue and get back to you if we have questions.

carlwilson avatar Mar 28 '24 14:03 carlwilson