bioformats
bioformats copied to clipboard
ND2: Invalid XML characters
Issue was raised on forum thread https://forum.image.sc/t/bfconvert-invalid-xml-character-unicode-0xffff-for-nd2-file/40453/3 and sample files have been provided in QA-29424
The issue was reproduced as described using Bio-Formats 6.5.1.
To reproduce, run showinf -nopix -omexml
which will result in:
[Fatal Error] :1:54: An invalid XML character (Unicode: 0xffff) was found in the element content of the document.
Exception in thread “main” java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 54; An invalid XML character (Unicode: 0xffff) was found in the element content of the document.
at ome.xml.model.XMLAnnotation.asXMLElement(XMLAnnotation.java:263)
at ome.xml.model.StructuredAnnotations.asXMLElement(StructuredAnnotations.java:681)
at ome.xml.model.OME.asXMLElement(OME.java:931)
at ome.xml.model.OME.asXMLElement(OME.java:771)
at ome.xml.meta.AbstractOMEXMLMetadata.dumpXML(AbstractOMEXMLMetadata.java:110)
at ome.xml.meta.OMEXMLMetadataImpl.dumpXML(OMEXMLMetadataImpl.java:105)
at loci.formats.ome.OMEPyramidStore.dumpXML(OMEPyramidStore.java:81)
at loci.formats.services.OMEXMLServiceImpl.getOMEXML(OMEXMLServiceImpl.java:468)
at loci.formats.FormatReader.setId(FormatReader.java:1422)
From debugging I have been able to find the problem annotation, however I have not yet been able to locate it in the raw file to determined the correct value for it.
The issue starts with parsing of ImageMetadataSeqLV|0!
at https://github.com/ome/bioformats/blob/develop/components/formats-gpl/src/loci/formats/in/NativeND2Reader.java#L583
Which then loops on https://github.com/ome/bioformats/blob/develop/components/formats-gpl/src/loci/formats/in/NativeND2Reader.java#L1951 for the below key names:
SLxPictureMetadata
sPicturePlanes
sSampleSetting
a0
matCameraToStage
Then we get to the problem tag which is Data
with the value z�1�E�����J�������J�?z�1�E��
This issue has been mentioned on Image.sc Forum. There might be relevant details there:
https://forum.image.sc/t/bfconvert-invalid-xml-character-unicode-0xffff-for-nd2-file/40453/4
I also encountered this problem except that 0xffff
was replaced by 0xfffe
. This error is thrown when bfconvert
or showinf -nopix -omexml
is called on some of my nd2 files. With ImageJ/Fiji however, these files can be opened and the metadata can all be extracted, and I saw:
Data b�������e?����e�b�����
Just curious why ImageJ/Fiji with its bioformats plugin wouldn't complain? Is the problematic process more related to writing and not reading?
Thanks @yichechang for reporting your issue. Retesting with the original sample dataset and it appears as though the original exception was resolved with the Bio-Formats 6.9.0 release. So any version from 6.9.0 onwards should run without exception. If your commands line tools are an older version it would be worth downloading the latest release (https://www.openmicroscopy.org/bio-formats/downloads/) and retesting:
Thanks @dgault for the info and sorry for my delayed reply!
Indeed, after upgrading to the latest version of bftools I was able to run bfconvert
and showinf -nopix -omexml
on those once problematic files.
Slightly off-topic, but the underlying issue I originally faced was, CellProfiler (which uses bioformats reader) still throws errors for these files. I checked that they use bioformats_package.jar
with version 6.10.0
. I understand this is most likely something on CellProfiler's end, and beyond bioformats' scope, but I would just like to verify that before I post in CellProfiler's repository in case there's some detail that I wasn't aware of.
Thank you again! At least now I could use bfconvert to convert my nd2 files to tiff files before importing them into CellProfiler!
Edited for clarification regarding CellProfiler's part.
I'm fairly certain that it is the same bug in Bio-Formats and CellProfiler. Because I have some nd2 files that can be handled only with Bio-Formats >= 6.9.0 and not earlier. These files are also causing CellProfiler to throw errors when extracting metadata from file headers. Now I also have nd2 files produced from another microscope installed with a likely different version of Nikon software; and these files can be processed correctly with Bio-Formats (regardless of its version) and CellProfiler.
Thanks for the feedback @yichechang, glad to hear the upgrade worked. It does sound like the CellProfiler issue is likely the same bug, it may be worth reporting to the CellProfiler team to see if an older version of the jar is being picked up on the classpath perhaps.