bioformats icon indicating copy to clipboard operation
bioformats copied to clipboard

Fake hashes fail validation

Open joshmoore opened this issue 2 years ago • 5 comments

When trying to parse this converted fake:

$cat /tmp/plate.fake.ini
plates=1
plateAcqs=1
plateRows=2
plateCols=2
fields=2

ome_types complains about the validation of the XML:

$ome_zarr info /tmp/plate.ome.zarr/
WARNING:ome_zarr.io:version mismatch: detected:FormatV02, requested:FormatV04
WARNING:ome_zarr.io:version mismatch: detected:FormatV04, requested:FormatV02
ERROR:ome_zarr_metadata.spec:failed to parse metadata: 8 validation errors for OME
images -> 0 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 1 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 2 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 3 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 4 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 5 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 6 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
images -> 7 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)
/private/tmp/plate.ome.zarr [zgroup]
 - metadata
   - Plate
   - bioformats2raw
 - data
   - (1, 1, 1, 1024, 1024)

for this XML:

$ xmlindent /tmp/plate.ome.zarr/OME/METADATA.ome.xml
...
            <HashSHA1>
               1234567890ABCDEF1234567890ABCDEF12345678
            </HashSHA1>

from: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#Plane_HashSHA1

joshmoore avatar Apr 13 '22 12:04 joshmoore

Hashes for fake datasets are set here: https://github.com/ome/ome-model/blob/9f1fb5647f3c76473747643808ddb044b7d5ab45/ome-xml/src/main/java/ome/specification/XMLMockObjects.java#L1096

Ideally we'd change XMLMockObjects to use 20 characters for the hash, release ome-model and update the dependency version here. If a fix is needed urgently though, FakeReader could override the HashSHA1 for now.

melissalinkert avatar Apr 13 '22 14:04 melissalinkert

I'm simply stripping them out for the moment (like I'm injecting MetadataOnly) so no huge rush. I couldn't figure out a valid regex, so we might want to add that to the upstream docs as an example once we do.

joshmoore avatar Apr 13 '22 15:04 joshmoore

While trying to establish the language variations in preparation of a deprecation (as discussed in https://github.com/ome/ome-model/pull/158#issuecomment-1101227811 ), I have used the following example

<?xml version="1.0" encoding="UTF-8"?>
<OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd">
   <Experiment ID="Experiment:0" Type="Photobleaching">
      <Description>Experiment</Description>
   </Experiment>
   <Plate ColumnNamingConvention="number" Columns="1" ExternalIdentifier="External Identifier" ID="Plate:0" Name="Plate Name 0" RowNamingConvention="letter" Rows="1" Status="Plate status" WellOriginX="0.0" WellOriginXUnit="µm" WellOriginY="1.0" WellOriginYUnit="µm">
      <Description>Plate 0 of 1</Description>
      <Well Color="255" Column="0" ExternalDescription="External Description" ExternalIdentifier="External Identifier" ID="Well:0_0_0_0" Row="0" Type="Transfection: done">
         <WellSample ID="WellSample:0_0_0_0_0_0" Index="0" PositionX="0.0" PositionXUnit="reference frame" PositionY="1.0" PositionYUnit="reference frame" Timepoint="2006-05-04T18:13:51">
            <ImageRef ID="Image:0"/>
         </WellSample>
      </Well>
      <PlateAcquisition EndTime="2006-05-04T18:13:51" ID="PlateAcquisition:0" Name="PlateAcquisition Name 0" StartTime="2006-05-04T18:13:51">
         <Description>PlateAcquisition 0 of 1</Description>
         <WellSampleRef ID="WellSample:0_0_0_0_0_0"/>
      </PlateAcquisition>
   </Plate>
   <Instrument ID="Instrument:0">
      <Microscope LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="Upright"/>
      <Laser FrequencyMultiplication="30" ID="LightSource:0" LaserMedium="Alexandrite" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" PockelCell="false" Power="200.0" PowerUnit="mW" RepetitionRate="30.0" RepetitionRateUnit="aHz" SerialNumber="0123456789" Tuneable="false" Type="Dye" Wavelength="200.0" WavelengthUnit="nm"/>
      <Arc ID="LightSource:1" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789" Type="HgXe"/>
      <Filament ID="LightSource:2" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789" Type="Halogen"/>
      <LightEmittingDiode ID="LightSource:3" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Power="200.0" PowerUnit="mW" SerialNumber="0123456789"/>
      <Laser FrequencyMultiplication="30" ID="LightSource:4" LaserMedium="Alexandrite" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" PockelCell="false" Power="200.0" PowerUnit="mW" RepetitionRate="30.0" RepetitionRateUnit="aHz" SerialNumber="0123456789" Tuneable="false" Type="Dye" Wavelength="200.0" WavelengthUnit="nm"/>
      <Detector AmplificationGain="0.0" Gain="1.0" ID="Detector:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" Offset="2.0" SerialNumber="0123456789" Type="CCD" Voltage="100" VoltageUnit="V" Zoom="3.0"/>
      <Objective CalibratedMagnification="1.0" Correction="UV" ID="Objective:0" Immersion="Oil" Iris="true" LensNA="0.5" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" NominalMagnification="1.5" SerialNumber="0123456789" WorkingDistance="1.0" WorkingDistanceUnit="µm"/>
      <FilterSet ID="FilterSet:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789"/>
      <Filter ID="Filter:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="LongPass">
         <TransmittanceRange CutIn="200.0" CutInTolerance="1.0" CutInToleranceUnit="nm" CutInUnit="nm" CutOut="300.0" CutOutTolerance="1.0" CutOutToleranceUnit="nm" CutOutUnit="nm" Transmittance="0.5"/>
      </Filter>
      <Filter ID="Filter:1" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789" Type="LongPass">
         <TransmittanceRange CutIn="200.0" CutInTolerance="1.0" CutInToleranceUnit="nm" CutInUnit="nm" CutOut="300.0" CutOutTolerance="1.0" CutOutToleranceUnit="nm" CutOutUnit="nm" Transmittance="0.5"/>
      </Filter>
      <Dichroic ID="Dichroic:0" LotNumber="9876543210" Manufacturer="Manufacturer" Model="Model" SerialNumber="0123456789"/>
   </Instrument>
   <Image ID="Image:0" Name="test">
      <Description>Image Description 0</Description>
      <ExperimentRef ID="Experiment:0"/>
      <ImagingEnvironment AirPressure="1.0" AirPressureUnit="mbar" CO2Percent="1.0" Humidity="1.0" Temperature="1.0" TemperatureUnit="°C"/>
      <StageLabel Name="StageLabel" X="1.0" XUnit="reference frame" Y="1.0" YUnit="reference frame" Z="1.0" ZUnit="reference frame"/>
      <Pixels BigEndian="false" DimensionOrder="XYZCT" ID="Pixels:0" Interleaved="false" PhysicalSizeX="1" PhysicalSizeXUnit="µm" PhysicalSizeY="1" PhysicalSizeYUnit="µm" PhysicalSizeZ="1" PhysicalSizeZUnit="µm" SignificantBits="8" SizeC="1" SizeT="1" SizeX="512" SizeY="512" SizeZ="1" Type="uint8">
         <Channel AcquisitionMode="FluorescenceLifetime" Color="1687603455" ContrastMethod="Brightfield" EmissionWavelength="300.3" EmissionWavelengthUnit="nm" ExcitationWavelength="400.3" ExcitationWavelengthUnit="nm" Fluor="Fluor" ID="Channel:0:0" IlluminationType="Oblique" NDFilter="1.0" Name="Name" PinholeSize="0.5" PinholeSizeUnit="µm" PockelCellSetting="0" SamplesPerPixel="1">
            <LightSourceSettings Attenuation="1.0" ID="LightSource:0" Wavelength="200.2" WavelengthUnit="nm"/>
            <DetectorSettings Binning="2x2" Gain="1.0" ID="Detector:0" Integration="20" Offset="1.0" ReadOutRate="1.0" ReadOutRateUnit="Hz" Voltage="1.0" VoltageUnit="V" Zoom="3.0"/>
            <LightPath>
               <ExcitationFilterRef ID="Filter:1"/>
               <DichroicRef ID="Dichroic:0"/>
               <EmissionFilterRef ID="Filter:0"/>
            </LightPath>
         </Channel>
         <MetadataOnly/>
         <Plane DeltaT="0.1" DeltaTUnit="s" ExposureTime="10.0" ExposureTimeUnit="s" PositionX="1.0" PositionXUnit="reference frame" PositionY="1.0" PositionYUnit="reference frame" PositionZ="1.0" PositionZUnit="reference frame" TheC="0" TheT="0" TheZ="0">
            <HashSHA1>1234567890ABCDEF1234567890ABCDEF12345678</HashSHA1>
         </Plane>
      </Pixels>
   </Image>
</OME>

Bio-Formats xmlvalid

% ./bftools/xmlvalid out.xml 
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd
Validating out.xml
No validation errors found.

Python's xmlschema

>>> import xmlschema
>>> xsd = xmlschema.XMLSchema('http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd')
>>> xsd.validate('out.xml')
>>>

Python ome_type (used above)

>>> import ome_types
>>> ome_types.from_xml('out.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/_convenience.py", line 29, in from_xml
    return OME(**d)  # type: ignore
  File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/model/ome.py", line 137, in __init__
    super().__init__(**data)
  File "/Users/sbesson/Downloads/venv/lib/python3.10/site-packages/ome_types/_base_type.py", line 80, in __init__
    super().__init__(**data)
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for OME
images -> 0 -> pixels -> planes -> 0 -> hash_sha1
  ensure this value has at most 20 characters (type=value_error.any_str.max_length; limit_value=20)

Looking briefly at the ome_type code, I suspect this is related https://github.com/tlambert03/ome-types/blob/eea4f503e80018ca60be7ed0616e258d1471d455/src/ome_autogen.py#L984-L986 which seems to apply the 20 character limit to the ConstrainedStr directly /cc @tlambert03

This does not invalidate the statement in https://github.com/ome/ome-model/pull/158#issuecomment-1101452483 that the value of the Plane.HashSHA1 element is neglectable and that we should move towards deprecating this element and removing it from the FakeReader OME-XML representation.

sbesson avatar Apr 18 '22 15:04 sbesson

happy to change that bit in ome_types. I can't remember the details now, but I did that when trying to update xmlschema to v >1.5 ... it gave me some annoyances and I guess that's where I ended up. But it would seem to be harmless to remove the constraint ?

tlambert03 avatar Apr 18 '22 15:04 tlambert03

You mean a simple no-op class similar to https://github.com/tlambert03/ome-types/blob/eea4f503e80018ca60be7ed0616e258d1471d455/src/ome_autogen.py#L981 ? This would work for this particular use case. I assume there is a built-in way to use the xmlschema encoding/decoding capabilities but as mentioned above, this specific element is outdated and we'll likely move towards removing it from the synthetically generated images.

sbesson avatar Apr 18 '22 16:04 sbesson