bioformats icon indicating copy to clipboard operation
bioformats copied to clipboard

Error parsing schema

Open caylamason opened this issue 6 months ago • 7 comments

Hi there,

I am using bftools version 8.2.0. I attempted to validate the XML after successfully converting a czi file to ome.tiff and received the following error:

$bftools/xmlvalid lab_processed/images/sample.ome.tiff Parsing schema path http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd Validating lab_processed/images/sample.ome.tiff Error parsing schema at http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw '301 Moved Permanently'. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaDOMParser.characters(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaParsingConfig.parse(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaParsingConfig.parse(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaDOMParser.parse(Unknown Source) at org.apache.xerces.impl.xs.traversers.XSDHandler.getSchemaDocument(Unknown Source) at org.apache.xerces.impl.xs.traversers.XSDHandler.parseSchema(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaLoader.loadSchema(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source) at org.apache.xerces.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source) at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:638) at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:670) at loci.common.xml.XMLTools.validateXML(XMLTools.java:871) at loci.common.xml.XMLTools.validateXML(XMLTools.java:785) at loci.formats.tools.XMLValidate.validate(XMLValidate.java:67) at loci.formats.tools.XMLValidate.validate(XMLValidate.java:104) at loci.formats.tools.XMLValidate.main(XMLValidate.java:125)

Based on the error message, it seems to be an issue with the schema formatting. Would you mind taking a look?

Thank you, Cayla

caylamason avatar May 29 '25 19:05 caylamason

@caylamason thanks for opening this issue. I can easily reproduce using Bio-Formats 8.2.0 and any of the official OME-XML or OME-TIFF samples. More specifically, the problem is not specific to Bio-Formats 8.2.0 and can be reproduced using the command-line utility for any Bio-Formats version.

The issue comes from the fact each file stores a reference to the OME XSD schema using http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd as per the specification. An HTTP -> HTTPS 301 redirect has been recently introduced at the level of http://www.openmicroscopy.org recently and some of the Java tooling fails to handle these redirects:

sbesson@Sebastiens-MacBook-Pro-3 Downloads % curl -IL http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
HTTP/1.1 301 Moved Permanently
Server: nginx/1.28.0
Date: Fri, 30 May 2025 13:31:17 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
Location: https://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd

HTTP/1.1 200 OK
Server: nginx/1.28.0
Date: Fri, 30 May 2025 13:31:17 GMT
Content-Type: application/octet-stream
Content-Length: 261500
Last-Modified: Wed, 28 May 2025 14:48:42 GMT
Connection: keep-alive
ETag: "6837224a-3fd7c"
Accept-Ranges: bytes

At the code level, there are a few possibilties to mitigate this issue:

  • update the low-level validation tools to support 301 redirects
  • resurrect #3268 which was a previous attempt to use of the cached XSD schemas instead of making HTTP(S) requests. Note this is already the strategy used when calling showinf -omexml

@jburel @pwalczysko could you comment on the infrastructure changes that has been made at the level of the OME resources?

sbesson avatar May 30 '25 13:05 sbesson

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/xmlvalid-error-parsing-schema/113698/2

imagesc-bot avatar Jun 18 '25 06:06 imagesc-bot

:+1: for the fix of xmlvalid in in #4316, but do we know if the issue will effect others?

joshmoore avatar Jun 18 '25 06:06 joshmoore

do we know if the issue will effect others?

Possibly any tool that does not use a cache copy of ome.xsd and does not handle HTTPS/HTTP 301 redirects.

Incidentally, I tried xmllint --schema which also fails:

sbesson@Sebastiens-MacBook-Pro-3 ome-model % xmllint --schema http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd  specification/samples/2016-06/ROI.ome.xml -noout
error : Unknown IO error
warning: failed to load external entity "http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd"
Schemas parser error : Failed to locate the main schema resource at 'http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd'.
WXS schema http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd failed to compile

Using a local copy schema locally works

sbesson@Sebastiens-MacBook-Pro-3 ome-model % curl -L -o ome.xsd  http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   169  100   169    0     0   2110      0 --:--:-- --:--:-- --:--:--  2112
100  255k  100  255k    0     0   703k      0 --:--:-- --:--:-- --:--:-- 1302k
sbesson@Sebastiens-MacBook-Pro-3 ome-model % xmllint --schema ome.xsd specification/samples/2016-06/ROI.ome.xml -noout    
specification/samples/2016-06/ROI.ome.xml validates

as well as validating the schema from a local deployment of http://github.com/ome/www.openmicroscopy.org

sbesson@Sebastiens-MacBook-Pro-3 ome-model % xmllint --schema http://0.0.0.0:4000/www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd  specification/samples/2016-06/ROI.ome.xml -noout
specification/samples/2016-06/ROI.ome.xml validates

sbesson avatar Jun 18 '25 08:06 sbesson

Any known workaround for the upstream issue? - If I understand correctly, all versions of xmlvalid are currently affected by the forced SSL redirect.

christianrickert avatar Jun 18 '25 17:06 christianrickert

@christianrickert You are correct that all released versions of xmlvalid are currently broken. My personal opinion is that, if possible, the OME schemas should remain available under HTTP. This might need to be balanced against the reasons that motivated the unilateral redirection from all traffic from HTTP to HTTPS. Ultimately this decision belongs to the academically funded teams maintaining the OME website.

In the meantime, a workaround with the current infrastructure is to download locally the OME schema and use xmllint as discussed above i.e.

curl -L -o ome.xsd  http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
tiffcomment image.ome.tif | xmllint --schema ome.xsd -noout

sbesson avatar Jun 18 '25 20:06 sbesson

@christianrickert You are correct that all released versions of xmlvalid are currently broken. My personal opinion is that, if possible, the OME schemas should be remain available under HTTP. This might need to be balanced against the reasons that justified the unilateral redirect from HTTP to HTTPS and ultimately this decision belongs to the academically funded teams maintaining the OME website.

Agreed.

In the meantime, a workaround with the current infrastructure is to download locally the OME schema and use xmllint as discussed above i.e.

curl -L -o ome.xsd  http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
tiffcomment image.ome.tif | xmllint --schema ome.xsd -noout

That's a neat trick! - I saw your xmllint code above but didn't make the connection (--schematron schema : do validation against a schematron) to xmlvalid.

Thank you very much for your help!

christianrickert avatar Jun 18 '25 21:06 christianrickert

Prior to change

 tools/xmlvalid ~/Desktop/course_lif/output.ome.tiff
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating /Users/jmarie/Desktop/course_lif/output.ome.tiff
Error parsing schema at http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw '301 Moved Permanently'.

After the change in the nginx configuration

tools/xmlvalid ~/Desktop/course_lif/output.ome.tiff
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating /Users/jmarie/Desktop/course_lif/output.ome.tiff
No validation errors found.

Let me know if any issue

jburel avatar Jun 30 '25 19:06 jburel

Thanks @jburel. Works for me with 8.2.0:

$ wget https://downloads.openmicroscopy.org/images/OME-XML/2016-06/hcs.ome.xml
...
$ xmlvalid hcs.ome.xml 
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating hcs.ome.xml
No validation errors found.
$ curl -IL http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
HTTP/1.1 200 OK
Server: nginx/1.28.0
Date: Mon, 30 Jun 2025 19:44:58 GMT
Content-Type: application/octet-stream
Content-Length: 261500
Last-Modified: Fri, 27 Jun 2025 12:06:59 GMT
Connection: keep-alive
ETag: "685e8963-3fd7c"
Accept-Ranges: bytes

melissalinkert avatar Jun 30 '25 19:06 melissalinkert

Thanks for implementing the change.

sbesson avatar Jul 01 '25 07:07 sbesson