bioformats icon indicating copy to clipboard operation
bioformats copied to clipboard

ND2: Invalid XML characters

Open dgault opened this issue 4 years ago • 5 comments

Issue was raised on forum thread https://forum.image.sc/t/bfconvert-invalid-xml-character-unicode-0xffff-for-nd2-file/40453/3 and sample files have been provided in QA-29424

The issue was reproduced as described using Bio-Formats 6.5.1.

To reproduce, run showinf -nopix -omexml which will result in:

[Fatal Error] :1:54: An invalid XML character (Unicode: 0xffff) was found in the element content of the document.
Exception in thread “main” java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 54; An invalid XML character (Unicode: 0xffff) was found in the element content of the document.
at ome.xml.model.XMLAnnotation.asXMLElement(XMLAnnotation.java:263)
at ome.xml.model.StructuredAnnotations.asXMLElement(StructuredAnnotations.java:681)
at ome.xml.model.OME.asXMLElement(OME.java:931)
at ome.xml.model.OME.asXMLElement(OME.java:771)
at ome.xml.meta.AbstractOMEXMLMetadata.dumpXML(AbstractOMEXMLMetadata.java:110)
at ome.xml.meta.OMEXMLMetadataImpl.dumpXML(OMEXMLMetadataImpl.java:105)
at loci.formats.ome.OMEPyramidStore.dumpXML(OMEPyramidStore.java:81)
at loci.formats.services.OMEXMLServiceImpl.getOMEXML(OMEXMLServiceImpl.java:468)
at loci.formats.FormatReader.setId(FormatReader.java:1422)

From debugging I have been able to find the problem annotation, however I have not yet been able to locate it in the raw file to determined the correct value for it.

The issue starts with parsing of ImageMetadataSeqLV|0! at https://github.com/ome/bioformats/blob/develop/components/formats-gpl/src/loci/formats/in/NativeND2Reader.java#L583

Which then loops on https://github.com/ome/bioformats/blob/develop/components/formats-gpl/src/loci/formats/in/NativeND2Reader.java#L1951 for the below key names:

SLxPictureMetadata
sPicturePlanes
sSampleSetting
a0
matCameraToStage

Then we get to the problem tag which is Data with the value z�1�E�����J�������J�?z�1�E��

dgault avatar Jul 21 '20 14:07 dgault

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/bfconvert-invalid-xml-character-unicode-0xffff-for-nd2-file/40453/4

imagesc-bot avatar Jul 21 '20 14:07 imagesc-bot

I also encountered this problem except that 0xffff was replaced by 0xfffe. This error is thrown when bfconvert or showinf -nopix -omexml is called on some of my nd2 files. With ImageJ/Fiji however, these files can be opened and the metadata can all be extracted, and I saw:

Data	b�������e?����e�b�����

Just curious why ImageJ/Fiji with its bioformats plugin wouldn't complain? Is the problematic process more related to writing and not reading?

yichechang avatar Aug 18 '22 02:08 yichechang

Thanks @yichechang for reporting your issue. Retesting with the original sample dataset and it appears as though the original exception was resolved with the Bio-Formats 6.9.0 release. So any version from 6.9.0 onwards should run without exception. If your commands line tools are an older version it would be worth downloading the latest release (https://www.openmicroscopy.org/bio-formats/downloads/) and retesting:

dgault avatar Aug 18 '22 14:08 dgault

Thanks @dgault for the info and sorry for my delayed reply!

Indeed, after upgrading to the latest version of bftools I was able to run bfconvert and showinf -nopix -omexml on those once problematic files.

Slightly off-topic, but the underlying issue I originally faced was, CellProfiler (which uses bioformats reader) still throws errors for these files. I checked that they use bioformats_package.jar with version 6.10.0. I understand this is most likely something on CellProfiler's end, and beyond bioformats' scope, but I would just like to verify that before I post in CellProfiler's repository in case there's some detail that I wasn't aware of.

Thank you again! At least now I could use bfconvert to convert my nd2 files to tiff files before importing them into CellProfiler!


Edited for clarification regarding CellProfiler's part.

I'm fairly certain that it is the same bug in Bio-Formats and CellProfiler. Because I have some nd2 files that can be handled only with Bio-Formats >= 6.9.0 and not earlier. These files are also causing CellProfiler to throw errors when extracting metadata from file headers. Now I also have nd2 files produced from another microscope installed with a likely different version of Nikon software; and these files can be processed correctly with Bio-Formats (regardless of its version) and CellProfiler.

yichechang avatar Aug 22 '22 14:08 yichechang

Thanks for the feedback @yichechang, glad to hear the upgrade worked. It does sound like the CellProfiler issue is likely the same bug, it may be worth reporting to the CellProfiler team to see if an older version of the jar is being picked up on the classpath perhaps.

dgault avatar Aug 29 '22 10:08 dgault