bioformats
bioformats copied to clipboard
Fake hashes fail validation
When trying to parse this converted fake:
$cat /tmp/plate.fake.ini
plates=1
plateAcqs=1
plateRows=2
plateCols=2
fields=2
ome_types
complains about the validation of the XML:
$ome_zarr info /tmp/plate.ome.zarr/
WARNING:ome_zarr.io:version mismatch: detected:FormatV02, requested:FormatV04
WARNING:ome_zarr.io:version mismatch: detected:FormatV04, requested:FormatV02
ERROR:ome_zarr_metadata.spec:failed to parse metadata: 8 validation errors for OME
images -> 0 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 1 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 2 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 3 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 4 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 5 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 6 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 7 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
/private/tmp/plate.ome.zarr [zgroup]
- metadata
- Plate
- bioformats2raw
- data
- (1, 1, 1, 1024, 1024)
for this XML:
$ xmlindent /tmp/plate.ome.zarr/OME/METADATA.ome.xml
...
<HashSHA1>
1234567890ABCDEF1234567890ABCDEF12345678
</HashSHA1>
from: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#Plane_HashSHA1
Hashes for fake datasets are set here: https://github.com/ome/ome-model/blob/9f1fb5647f3c76473747643808ddb044b7d5ab45/ome-xml/src/main/java/ome/specification/XMLMockObjects.java#L1096
Ideally we'd change XMLMockObjects
to use 20 characters for the hash, release ome-model and update the dependency version here. If a fix is needed urgently though, FakeReader
could override the HashSHA1
for now.
I'm simply stripping them out for the moment (like I'm injecting MetadataOnly
) so no huge rush. I couldn't figure out a valid regex, so we might want to add that to the upstream docs as an example once we do.
While trying to establish the language variations in preparation of a deprecation (as discussed in https://github.com/ome/ome-model/pull/158#issuecomment-1101227811 ), I have used the following example
<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
<Experiment ID="Experiment:0" Type="Photobleaching">
<Description>Experiment</Description>
</Experiment>
<Plate ColumnNamingConvention="number" Columns="1" ExternalIdentifier="External Identifier" ID="Plate:0" Name="Plate Name 0" RowNamingConvention="letter" Rows="1" Status="Plate status" WellOriginX="0.0" WellOriginXUnit="µm" WellOriginY="1.0" WellOriginYUnit="µm">
<Description>Plate 0 of 1</Description>
<Well Color="255" Column="0" ExternalDescription="External Description" ExternalIdentifier="External Identifier" ID="Well:0_0_0_0" Row="0" Type="Transfection: done">
<WellSample ID="WellSample:0_0_0_0_0_0" Index="0" PositionX="0.0" PositionXUnit="reference frame" PositionY="1.0" PositionYUnit="reference frame" Timepoint="2006-05-04T18:13:51">
<ImageRef ID="Image:0"/>
</WellSample>
</Well>
<PlateAcquisition EndTime="2006-05-04T18:13:51" ID="PlateAcquisition:0" Name="PlateAcquisition Name 0" StartTime="2006-05-04T18:13:51">
<Description>PlateAcquisition 0 of 1</Description>
<WellSampleRef ID="WellSample:0_0_0_0_0_0"/>
</PlateAcquisition>
</Plate>
<Instrument ID="Instrument:0">
<Microscope LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="Upright"/>
<Laser FrequencyMultiplication="30" ID="LightSource:0" LaserMedium="Alexandrite" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" PockelCell="false" Power="200.0" PowerUnit="mW" RepetitionRate="30.0" RepetitionRateUnit="aHz" SerialNumber="0123456789" Tuneable="false" Type="Dye" Wavelength="200.0" WavelengthUnit="nm"/>
<Arc ID="LightSource:1" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789" Type="HgXe"/>
<Filament ID="LightSource:2" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789" Type="Halogen"/>
<LightEmittingDiode ID="LightSource:3" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789"/>
<Laser FrequencyMultiplication="30" ID="LightSource:4" LaserMedium="Alexandrite" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" PockelCell="false" Power="200.0" PowerUnit="mW" RepetitionRate="30.0" RepetitionRateUnit="aHz" SerialNumber="0123456789" Tuneable="false" Type="Dye" Wavelength="200.0" WavelengthUnit="nm"/>
<Detector AmplificationGain="0.0" Gain="1.0" ID="Detector:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Offset="2.0" SerialNumber="0123456789" Type="CCD" Voltage="100" VoltageUnit="V" Zoom="3.0"/>
<Objective CalibratedMagnification="1.0" Correction="UV" ID="Objective:0" Immersion="Oil" Iris="true" LensNA="0.5" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" NominalMagnification="1.5" SerialNumber="0123456789" WorkingDistance="1.0" WorkingDistanceUnit="µm"/>
<FilterSet ID="FilterSet:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789"/>
<Filter ID="Filter:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="LongPass">
<TransmittanceRange CutIn="200.0" CutInTolerance="1.0" CutInToleranceUnit="nm" CutInUnit="nm" CutOut="300.0" CutOutTolerance="1.0" CutOutToleranceUnit="nm" CutOutUnit="nm" Transmittance="0.5"/>
</Filter>
<Filter ID="Filter:1" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="LongPass">
<TransmittanceRange CutIn="200.0" CutInTolerance="1.0" CutInToleranceUnit="nm" CutInUnit="nm" CutOut="300.0" CutOutTolerance="1.0" CutOutToleranceUnit="nm" CutOutUnit="nm" Transmittance="0.5"/>
</Filter>
<Dichroic ID="Dichroic:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789"/>
</Instrument>
<Image ID="Image:0" Name="test">
<Description>Image Description 0</Description>
<ExperimentRef ID="Experiment:0"/>
<ImagingEnvironment AirPressure="1.0" AirPressureUnit="mbar" CO2Percent="1.0" Humidity="1.0" Temperature="1.0" TemperatureUnit="°C"/>
<StageLabel Name="StageLabel" X="1.0" XUnit="reference frame" Y="1.0" YUnit="reference frame" Z="1.0" ZUnit="reference frame"/>
<Pixels BigEndian="false" DimensionOrder="XYZCT" ID="Pixels:0" Interleaved="false" PhysicalSizeX="1" PhysicalSizeXUnit="µm" PhysicalSizeY="1" PhysicalSizeYUnit="µm" PhysicalSizeZ="1" PhysicalSizeZUnit="µm" SignificantBits="8" SizeC="1" SizeT="1" SizeX="512" SizeY="512" SizeZ="1" Type="uint8">
<Channel AcquisitionMode="FluorescenceLifetime" Color="1687603455" ContrastMethod="Brightfield" EmissionWavelength="300.3" EmissionWavelengthUnit="nm" ExcitationWavelength="400.3" ExcitationWavelengthUnit="nm" Fluor="Fluor" ID="Channel:0:0" IlluminationType="Oblique" NDFilter="1.0" Name="Name" PinholeSize="0.5" PinholeSizeUnit="µm" PockelCellSetting="0" SamplesPerPixel="1">
<LightSourceSettings Attenuation="1.0" ID="LightSource:0" Wavelength="200.2" WavelengthUnit="nm"/>
<DetectorSettings Binning="2x2" Gain="1.0" ID="Detector:0" Integration="20" Offset="1.0" ReadOutRate="1.0" ReadOutRateUnit="Hz" Voltage="1.0" VoltageUnit="V" Zoom="3.0"/>
<LightPath>
<ExcitationFilterRef ID="Filter:1"/>
<DichroicRef ID="Dichroic:0"/>
<EmissionFilterRef ID="Filter:0"/>
</LightPath>
</Channel>
<MetadataOnly/>
<Plane DeltaT="0.1" DeltaTUnit="s" ExposureTime="10.0" ExposureTimeUnit="s" PositionX="1.0" PositionXUnit="reference frame" PositionY="1.0" PositionYUnit="reference frame" PositionZ="1.0" PositionZUnit="reference frame" TheC="0" TheT="0" TheZ="0">
<HashSHA1>1234567890ABCDEF1234567890ABCDEF12345678</HashSHA1>
</Plane>
</Pixels>
</Image>
</OME>
Bio-Formats xmlvalid
% ./bftools/xmlvalid out.xml
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating out.xml
No validation errors found.
Python's xmlschema
>>> import xmlschema
>>> xsd = xmlschema.XMLSchema('http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd')
>>> xsd.validate('out.xml')
>>>
Python ome_type
(used above)
>>> import ome_types
>>> ome_types.from_xml('out.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/_convenience.py", line 29, in from_xml
return OME(**d) # type: ignore
File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/model/ome.py", line 137, in __init__
super().__init__(**data)
File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/_base_type.py", line 80, in __init__
super().__init__(**data)
File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for OME
images -> 0 -> pixels -> planes -> 0 -> hash_sha1
ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
Looking briefly at the ome_type
code, I suspect this is related https://github.com/tlambert03/ome-types/blob/eea4f503e80018ca60be7ed0616e258d1471d455/src/ome_autogen.py#L984-L986 which seems to apply the 20 character limit to the ConstrainedStr
directly /cc @tlambert03
This does not invalidate the statement in https://github.com/ome/ome-model/pull/158#issuecomment-1101452483 that the value of the Plane.HashSHA1
element is neglectable and that we should move towards deprecating this element and removing it from the FakeReader
OME-XML representation.
happy to change that bit in ome_types. I can't remember the details now, but I did that when trying to update xmlschema to v >1.5 ... it gave me some annoyances and I guess that's where I ended up. But it would seem to be harmless to remove the constraint ?
You mean a simple no-op class similar to https://github.com/tlambert03/ome-types/blob/eea4f503e80018ca60be7ed0616e258d1471d455/src/ome_autogen.py#L981 ? This would work for this particular use case. I assume there is a built-in way to use the xmlschema
encoding/decoding capabilities but as mentioned above, this specific element is outdated and we'll likely move towards removing it from the synthetically generated images.