bioformats
bioformats copied to clipboard
Error parsing schema
Hi there,
I am using bftools version 8.2.0. I attempted to validate the XML after successfully converting a czi file to ome.tiff and received the following error:
$bftools/xmlvalid lab_processed/images/sample.ome.tiff Parsing schema path http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd Validating lab_processed/images/sample.ome.tiff Error parsing schema at http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw '301 Moved Permanently'. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaDOMParser.characters(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaParsingConfig.parse(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaParsingConfig.parse(Unknown Source) at org.apache.xerces.impl.xs.opti.SchemaDOMParser.parse(Unknown Source) at org.apache.xerces.impl.xs.traversers.XSDHandler.getSchemaDocument(Unknown Source) at org.apache.xerces.impl.xs.traversers.XSDHandler.parseSchema(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaLoader.loadSchema(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source) at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source) at org.apache.xerces.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source) at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:638) at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:670) at loci.common.xml.XMLTools.validateXML(XMLTools.java:871) at loci.common.xml.XMLTools.validateXML(XMLTools.java:785) at loci.formats.tools.XMLValidate.validate(XMLValidate.java:67) at loci.formats.tools.XMLValidate.validate(XMLValidate.java:104) at loci.formats.tools.XMLValidate.main(XMLValidate.java:125)
Based on the error message, it seems to be an issue with the schema formatting. Would you mind taking a look?
Thank you, Cayla
@caylamason thanks for opening this issue. I can easily reproduce using Bio-Formats 8.2.0 and any of the official OME-XML or OME-TIFF samples. More specifically, the problem is not specific to Bio-Formats 8.2.0 and can be reproduced using the command-line utility for any Bio-Formats version.
The issue comes from the fact each file stores a reference to the OME XSD schema using http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd as per the specification. An HTTP -> HTTPS 301 redirect has been recently introduced at the level of http://www.openmicroscopy.org recently and some of the Java tooling fails to handle these redirects:
sbesson@Sebastiens-MacBook-Pro-3 Downloads % curl -IL http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
HTTP/1.1 301 Moved Permanently
Server: nginx/1.28.0
Date: Fri, 30 May 2025 13:31:17 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
Location: https://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
HTTP/1.1 200 OK
Server: nginx/1.28.0
Date: Fri, 30 May 2025 13:31:17 GMT
Content-Type: application/octet-stream
Content-Length: 261500
Last-Modified: Wed, 28 May 2025 14:48:42 GMT
Connection: keep-alive
ETag: "6837224a-3fd7c"
Accept-Ranges: bytes
At the code level, there are a few possibilties to mitigate this issue:
- update the low-level validation tools to support 301 redirects
- resurrect #3268 which was a previous attempt to use of the cached XSD schemas instead of making HTTP(S) requests. Note this is already the strategy used when calling
showinf -omexml
@jburel @pwalczysko could you comment on the infrastructure changes that has been made at the level of the OME resources?
This issue has been mentioned on Image.sc Forum. There might be relevant details there:
https://forum.image.sc/t/xmlvalid-error-parsing-schema/113698/2
:+1: for the fix of xmlvalid in in #4316, but do we know if the issue will effect others?
do we know if the issue will effect others?
Possibly any tool that does not use a cache copy of ome.xsd and does not handle HTTPS/HTTP 301 redirects.
Incidentally, I tried xmllint --schema which also fails:
sbesson@Sebastiens-MacBook-Pro-3 ome-model % xmllint --schema http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd specification/samples/2016-06/ROI.ome.xml -noout
error : Unknown IO error
warning: failed to load external entity "http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd"
Schemas parser error : Failed to locate the main schema resource at 'http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd'.
WXS schema http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd failed to compile
Using a local copy schema locally works
sbesson@Sebastiens-MacBook-Pro-3 ome-model % curl -L -o ome.xsd http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 169 100 169 0 0 2110 0 --:--:-- --:--:-- --:--:-- 2112
100 255k 100 255k 0 0 703k 0 --:--:-- --:--:-- --:--:-- 1302k
sbesson@Sebastiens-MacBook-Pro-3 ome-model % xmllint --schema ome.xsd specification/samples/2016-06/ROI.ome.xml -noout
specification/samples/2016-06/ROI.ome.xml validates
as well as validating the schema from a local deployment of http://github.com/ome/www.openmicroscopy.org
sbesson@Sebastiens-MacBook-Pro-3 ome-model % xmllint --schema http://0.0.0.0:4000/www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd specification/samples/2016-06/ROI.ome.xml -noout
specification/samples/2016-06/ROI.ome.xml validates
Any known workaround for the upstream issue? - If I understand correctly, all versions of xmlvalid are currently affected by the forced SSL redirect.
@christianrickert You are correct that all released versions of xmlvalid are currently broken. My personal opinion is that, if possible, the OME schemas should remain available under HTTP. This might need to be balanced against the reasons that motivated the unilateral redirection from all traffic from HTTP to HTTPS. Ultimately this decision belongs to the academically funded teams maintaining the OME website.
In the meantime, a workaround with the current infrastructure is to download locally the OME schema and use xmllint as discussed above i.e.
curl -L -o ome.xsd http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
tiffcomment image.ome.tif | xmllint --schema ome.xsd -noout
@christianrickert You are correct that all released versions of
xmlvalidare currently broken. My personal opinion is that, if possible, the OME schemas should be remain available under HTTP. This might need to be balanced against the reasons that justified the unilateral redirect from HTTP to HTTPS and ultimately this decision belongs to the academically funded teams maintaining the OME website.
Agreed.
In the meantime, a workaround with the current infrastructure is to download locally the OME schema and use
xmllintas discussed above i.e.curl -L -o ome.xsd http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd tiffcomment image.ome.tif | xmllint --schema ome.xsd -noout
That's a neat trick! - I saw your xmllint code above but didn't make the connection (--schematron schema : do validation against a schematron) to xmlvalid.
Thank you very much for your help!
Prior to change
tools/xmlvalid ~/Desktop/course_lif/output.ome.tiff
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating /Users/jmarie/Desktop/course_lif/output.ome.tiff
Error parsing schema at http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
org.xml.sax.SAXParseException: s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than 'xs:appinfo' and 'xs:documentation'. Saw '301 Moved Permanently'.
After the change in the nginx configuration
tools/xmlvalid ~/Desktop/course_lif/output.ome.tiff
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating /Users/jmarie/Desktop/course_lif/output.ome.tiff
No validation errors found.
Let me know if any issue
Thanks @jburel. Works for me with 8.2.0:
$ wget https://downloads.openmicroscopy.org/images/OME-XML/2016-06/hcs.ome.xml
...
$ xmlvalid hcs.ome.xml
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating hcs.ome.xml
No validation errors found.
$ curl -IL http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
HTTP/1.1 200 OK
Server: nginx/1.28.0
Date: Mon, 30 Jun 2025 19:44:58 GMT
Content-Type: application/octet-stream
Content-Length: 261500
Last-Modified: Fri, 27 Jun 2025 12:06:59 GMT
Connection: keep-alive
ETag: "685e8963-3fd7c"
Accept-Ranges: bytes
Thanks for implementing the change.