bioformats icon indicating copy to clipboard operation
bioformats copied to clipboard

Reworking LMSMetadata package and adding a new lif reader

Open XLEFReaderForBioformats opened this issue 2 years ago • 22 comments

Main changes:

  • reworking metadata translation (and class architecture) in the LMSMetadata package, improving metadata translation especially for SP8 and STELLARIS images, also reworking metadata translation for other images (SP5, widefield)
  • Adding a new LMSLIFReader for reading .lif files that also uses the LMSMetadata package

In general, metadata translation will be an ongoing topic here, and there might also be images from systems that are not yet fully regarded and where metadata translation needs to be improved. I'll try to fix all major bugs that might be noticed in the context of this PR, but concerning minor bugs and "missing metadata" , I guess I have to discuss with my colleagues how much of that I will be able to fix within this PR.

@dgault as we have discussed before, since you suggested that we could use a DelegateReader here to provide both the existing LIFReader and the new LMSLIFReader to users, I think that's a reasonable approach, so this PR at least won't create a deterioration of existing .lif files reading. I removed my temporal changes to readers.txt and LIFReader.isThisType(...) for testing so now the LIFReader is the standard reader for .lif files again. Could you let me know how using a delegate reader would look like so both lif readers can be used?

Please let me know if you need any additional information or explanations from my side.

XLEFReaderForBioformats avatar May 04 '23 10:05 XLEFReaderForBioformats

Thanks @XLEFReaderForBioformats for the effort to get this completed.

The DelegateReader would be a rather simple new Reader looking similar to https://github.com/ome/bioformats/blob/develop/components/formats-gpl/src/loci/formats/in/ND2Reader.java. There is the longer term option of adding some smarter decision making logic into the DelegateReader but for now the basic version should suffice. This way the 2 readers can be switched through a single API call.

Are there any specific files that would be good to test the new LMSLIFReader?

dgault avatar May 04 '23 11:05 dgault

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/tiling-issue-when-opening-leica-mica-lif-images-in-fiji-bio-formats/79769/12

imagesc-bot avatar Jul 26 '23 13:07 imagesc-bot

@XLEFReaderForBioformats : in order to review this pull request for inclusion, we need to have a better understanding of the testing requirements, including test data as requested in https://github.com/ome/bioformats/pull/4000#issuecomment-1534605538. If we don't hear back within the next week, this pull request will be closed (but you are welcome to re-open it once testing information is added).

melissalinkert avatar Jan 15 '24 16:01 melissalinkert

Hi @melissalinkert , sorry for the late reply. We are currently collecting test data for the new LIF reader which we are planning to provide to you, before that we also want to run some in-house tests to get a first idea of what's still missing.

I can hopefully continue my work on this PR in this quarter and will keep you posted on the test data. We would like to suggest that the new LIF reader is used as standard option, but to avoid any regression, we need to prepare this carefully.

XLEFReaderForBioformats avatar Jan 16 '24 08:01 XLEFReaderForBioformats

@XLEFReaderForBioformats : thanks for your continued work here. Is there any update on relevant test data? I understand you may still have some work in progress, but since this is already a very large set of changes we would like to start planning a testing strategy as soon as practical.

melissalinkert avatar Feb 26 '24 23:02 melissalinkert

Hi @melissalinkert,

I have collected a new set of test data that I can link here today. It is not yet complete, but I will update it as soon as I have more data at hand.

Is there any other information that you need for planning the testing strategy? I would try to use the DelegateReader solution as suggested by David, but also we would like to set the new LMSLIFReader as the default option, so that we can reach the majority of users with the new implementation. We're aware that this might require some more careful preparations.

XLEFReaderForBioformats avatar Feb 27 '24 08:02 XLEFReaderForBioformats

Hi @melissalinkert ,

here is the test data that I can provide so far. I focused on collecting data from different imaging systems that create different XML layouts.

Widefield:

MICA:

I also provided confocal images in the first dataset (SP8, STELLARIS 5, STELLARIS 8), but I am planning to share SP5 and maybe additional SP8 data as well.

edit @dgault @melissalinkert : here is an additional SP5 dataset:

XLEFReaderForBioformats avatar Feb 27 '24 09:02 XLEFReaderForBioformats

Thank you for the update, @XLEFReaderForBioformats. For now, I think if you can let us know when you have finished any planned changes, that would be helpful. If you are planning to use a DelegateReader, that should be done before we start reviewing and testing. The current state of this pull request turns off the original LIF reader entirely, so having the DelegateReader in place (with either reader as default) would make it much easier to compare the two readers' behavior.

For internal testing, all datasets linked from https://github.com/ome/bioformats/pull/4000#issuecomment-1966113007 are now in inbox/leica-2024-02-27/.

melissalinkert avatar Feb 28 '24 16:02 melissalinkert

Hi @melissalinkert ,

the only thing that's still pending for me (besides the DelegateReader solution) is the support for SP5 images without LDM block. I'll finish this until next week hopefully, and then I think we can start the reviewing and testing.

XLEFReaderForBioformats avatar Feb 29 '24 10:02 XLEFReaderForBioformats

Hi @dgault ,

I am now ready to implement your suggested DelegateReader solution. You linked ND2Reader as an example, is this info still up to date? I found DelegateReader examples in JPEGReader and TiffDelegateReader which look comprehensible to me at first sight, in ND2Reader I couldn't find where the logic for using different readers would be. A clarification would be welcome. :)

edit: nvm, I oriented towards the TiffDelegateReader and it's working ^^ (not sure though if there are any methods missing in LIFDelegateReader that I should add)

XLEFReaderForBioformats avatar Mar 11 '24 11:03 XLEFReaderForBioformats

The ND2 Reader example is out of date now as we removed the legacy ND2 reader in Bio-Formats 7.0.0. The older version can be seen here, but the other Delegate readers are going to be very similar. Looking quickly at the last commit and it looks to be correct.

I am going to mark this for inclusion with our nightly repository tests to give us an initial idea of the impact on existing filesets. I will update the PR once the tests have run to see the results.

dgault avatar Mar 12 '24 12:03 dgault

okay great, thanks for letting me know! 👍 excited for the results 😄

XLEFReaderForBioformats avatar Mar 12 '24 13:03 XLEFReaderForBioformats

This PR was included in the CI last night, unfortunately a number of files threw exceptions at the setID stage so the full test suite didn't actually get to run. I have only had a quick look at the failures but a lot of them look to be NullPointerExceptions. Unfortunately the files that are failing are not ones that are public.

The full console output from the test run is at: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/console

dgault avatar Mar 13 '24 15:03 dgault

hi @dgault, okay thanks for sharing, I'll try to fix all the issues where exceptions are thrown.

XLEFReaderForBioformats avatar Mar 14 '24 09:03 XLEFReaderForBioformats

hi @dgault , I tried to address all the exceptions that happened in the tests, trying to increase the robustness against unexpected input. However, I do not yet understand the root causes of these errors, which are images with data that the reader does not expect. I understand that you cannot share these image data publicly, however is it maybe possible to make this image data available to me directly?

XLEFReaderForBioformats avatar Mar 14 '24 12:03 XLEFReaderForBioformats

@XLEFReaderForBioformats I'm curious to know whether your request will be approved because I've had exactly the same issue in here: https://github.com/ome/bioformats/pull/4092

This point has also been raised and answered in the image.sc forum: https://forum.image.sc/t/pb-when-i-open-czi-stitching-image/87585/24

NicoKiaru avatar Mar 20 '24 11:03 NicoKiaru

@NicoKiaru thanks for sharing, good to know...

@dgault I have a suggestion... I assume that the problems of the new reader with your test data arise from unexpected XML layout and not so much from the image binary data itself. Would it be an option for you to share only the image XML with me privately?

If you have only LIF files available, you could copy the XML using a hex editor, or alternatively you could e.g. use a LASX Offline to open LIF files and save them as e.g. XLEF/TIF files, and then it would suffice for me if you sent me the XLIF files, which can be found in metadata folders. You can also obfuscate any image names, file paths and user-comments in the XML.

Having the image XML, I can create some "fake" image datasets with it for local testing.

Would this be an option?

XLEFReaderForBioformats avatar Mar 25 '24 09:03 XLEFReaderForBioformats

hi @dgault @melissalinkert, could you possibly support me with the latest build errors? my changes built locally, and for some reason I cannot see the build logs of the failed builds (it only displays "this job failed")...

XLEFReaderForBioformats avatar Apr 04 '24 07:04 XLEFReaderForBioformats

@XLEFReaderForBioformats, Im not sure why those jobs failed in that way but I re-triggered them again and they should now all be green

dgault avatar Apr 08 '24 14:04 dgault

@dgault I'd like to ask if there are any updates regarding the test data? It would be helpful for me to know at least if you are planning to share any data with me or not, right now it is a bit unclear to me how to proceed with this PR.

XLEFReaderForBioformats avatar Apr 30 '24 07:04 XLEFReaderForBioformats

Hi @XLEFReaderForBioformats, the next step will be to reinclude the PR in the nightly CI, testing the new reader rather than the legacy reader. If we can get a full run of the test suite then we will at least be able to identify the number of files which are causing failures and judge the scale of the impact. From there we can decide how best to proceed.

I will re-include the PR now so we can have the new reader in the nightly CI for the purposes of the testing.

dgault avatar May 01 '24 14:05 dgault

Hi @XLEFReaderForBioformats, as a follow up, the PR has been included in our test suite for a number of days and we are seeing failures for a number of files, though most are for the same exception. You can see the LIF failures here: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/7512/consoleFull

   [testng] java.lang.IndexOutOfBoundsException: Index 751 out of bounds for length 13
   [testng] 	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
   [testng] 	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
   [testng] 	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
   [testng] 	at java.base/java.util.Objects.checkIndex(Objects.java:385)
   [testng] 	at java.base/java.util.ArrayList.get(ArrayList.java:427)
   [testng] 	at loci.formats.in.LMSLIFReader.initFile(LMSLIFReader.java:461)
   [testng] 	at loci.formats.FormatReader.setId(FormatReader.java:1480)
   [testng] 	at loci.formats.in.LIFDelegateReader.setId(LIFDelegateReader.java:37)
   [testng] 	at loci.formats.ImageReader.setId(ImageReader.java:864)

The failures seem to be caused by the following line when retrieving the offsets: https://github.com/ome/bioformats/pull/4000/files#diff-79e103bcccaa0d842ec61c1bf11e438d3ace6607e364135a7e08b3947f2dec7cR461

The following sample files which failed are available for testing from here. These should be good samples to test for the IndexOutOfBoundsException

  • michael/PR2729_frameOrderCombinedScanTypes.lif
  • imagesc-30856/20191025 Test FRET 585. 423, 426.lif
  • seanwarren/150519_FRAP_test_ROIs_chromagreen/150519_FRAP_test_ROIs_chromagreen.lif

dgault avatar May 07 '24 12:05 dgault