aicsimageio icon indicating copy to clipboard operation
aicsimageio copied to clipboard

Pyramid + Tiled SVS (TIFF) Reading

Open folterj opened this issue 3 years ago • 14 comments

System and Software

  • aicsimageio Version: 3.3.4
  • Python Version: 3.8
  • Operating System: Win10 64

Description

None of the functions appear to work for large svs files. Aperio svs files are internally multi-page multi-tile OME Tiff files. All functions appear to result in a numpy MemoryError. In addition looking at all the examples and source code, it is still unclear to me how to load a single tile or part of an image.

Reproduction

reader = TiffReader(filename) reader.shape # error reader.dask_data # error reader.get_image_dask_data(reader.dims, ?) # error imread_dask(filename) # error

Perhaps this is caused here: tiff_reader ~line 128 (which would attempt to get an array of the whole image of page 0 - by default the largest image of the pyramid file):

# Get sample yx plane
sample = scenes[0].pages[0].asarray()

folterj avatar Apr 08 '21 12:04 folterj

Hey @folterj sorry to hear you are having difficulties.

  1. I would love to see what happens to this file in our 4.0 API. If you have the time can you:
pip install aicsimageio --upgrade --pre

which should upgrade your version to the latest dev release. I just want to see if you get the same error.

  1. If this is a pyramid file, we have an open issue about resolving an API to handle them: #140. Which is slated for 4.1 support. Specifically, adopting an API like the following:
img = AICSImage("my-pyramid.czi")  # or Reader
img.levels  # returns tuple of levels
img.current_level  # returns current level, default / starts at level 0 (full resolution)
img.set_level  # update current level

evamaxfield avatar Apr 08 '21 15:04 evamaxfield

Hi @JacksonMaxfield thanks for following up so soon.

I've installed the latest dev version as you suggested. Using this, reader.shape works correctly (e.g.: (71606, 85372, 3)) lazy = imread_dask(filename) also works (e.g.: (1, 1, 1, 71606, 85372, 3)) So it seems these issues are already resolved in the latest version.

However, the following still gives an out of memory error (assuming this is correct usage): test = lazy[0,0,0,0:10,0:10,:] test.compute()

Also I'm unable to identify the correct syntax to get a single tile or part of the image using get_image_dask_data(), after tried various arguments. A single example that's not just loading a whole slide would be wonderful.

folterj avatar Apr 08 '21 15:04 folterj

Yea this is what I suspected unfortunately.

So our chunking on the dask array happens on the ZYXS dimensions (all spatial dimension + RGB / "samples") -- which you can customize but wouldn't help any case. What is happening in 4.0 it looks like is that it is simply delaying the memory error until you call compute. Which makes sense if you know how we chunk the file.

That chunk is requested in whole, so we try to load all of that chunks data in memory using tifffile / numpy and numpy can't allocate that much memory.

So a couple of more questions:

  1. Is this a pyramid file?
  2. You mentioned tiles, is this also a mosaic file?

evamaxfield avatar Apr 08 '21 15:04 evamaxfield

@JacksonMaxfield thank you for elaborating. To enable performant loading of slides, of course it's essential to be able to load slides in parts, like openslide, tifffile etc support. We're interested in getting small patches for use in deep learning. As openslide is slow when using many tiles on many slides I've written a variation using tifffile, which is already a bit faster without performance tuning. I was hoping your library would work for this purpose.

Our source files are Aperio svs, which are really (OME) Tiff files, pyramid indeed (using steps of 4x). Each level is tiled as well (256x256x4), JP2K compressed. We are currently using the public TCGA pathology dataset for testing.

I forgot to mention, the format that dims(.order) returns is YXS (and I believe it showed as SYXC in version 3.3.4)

folterj avatar Apr 08 '21 15:04 folterj

Okay this is very helpful.

So here is what I can say:

  1. As linked above, we don't have support for multi-level pyramid reading just yet in 4.0. (#140)
  2. In 4.0 we can read and create a dask array specific to the mosaic. https://github.com/AllenCellModeling/aicsimageio/blob/main/aicsimageio/readers/reader.py#L402

I.E.

from aicsimageio import AICSImage

stitched_image = AICSImage("tiled.lif")
stitched_image.dims  # very large Y and X
stitched_image.dask_data[0, 0, 0, :300, :300, :].compute()  # would get first 300 y x pixels regardless of tile boundary
# but may load multiple tiles in the process, depending on chunk border

We have dask stitched mosaic tiles done for LIF files for example (see here) but not for TIFFs. If you wanted to take a stab at adding support for that I think we would all be incredibly grateful and happily accept the PR. Otherwise, if you can drop a link to the dataset where I can download a file or two I can maybe try adding it in soon but no real promises as I have a lot of work to catch up on my other projects + day job :joy:

evamaxfield avatar Apr 08 '21 15:04 evamaxfield

I forgot to mention, the format that dims(.order) returns is YXS (and I believe it showed as SYXC in version 3.3.4)

Yea we changed "S" from meaning "Scenes" to "Samples". It's a weird change but it allows us to actually properly support scenes as a stateful property of the object. Scenes can be different shapes, dtypes, etc so stateful is better than packing all scenes into a single array.

evamaxfield avatar Apr 08 '21 16:04 evamaxfield

Ah ok that clarifies some things - but at the same time I'm confused - the reason I got here is stumbling on this: https://github.com/AllenCellModeling/aicsimageio/issues/178 as this interestingly implied it could read random chunks efficiently from Tiff files (ignoring multiple levels), but I probably misinterpreted. By the way the Tiff stitching is relatively easy - I was able to quickly write a un-optimised class for multi-page/tile reading based on this: https://gist.github.com/rfezzani/b4b8852c5a48a901c1e94e09feb34743 of course it's not written for performance.

Thank you for the example. In my case I needed to reduce the dimensions to this instead: reader.dask_data[0:300, 0:300, :].compute() Which unfortunately gives the memory error again

PS: In case it's more appropriate to continue on https://forum.image.sc/ then I'm happy to summarise & move there

folterj avatar Apr 08 '21 17:04 folterj

Ah ok that clarifies some things - but at the same time I'm confused - the reason I got here is stumbling on this: #178

Ahh yes. A bit hard to explain but that PR resolved "non-YX" dimension chunked reads. This is maybe a fault in our design decisions but we read a whole YX plane as the "lowest level" chunk. You can't get a smaller chunk than the whole YX plane for non-tiled-images. The bug report you linked / the PR resolving it, basically sped up the chunked reads for "give me the whole ZYX dimension chunk or the whole TYX dimension chunk OR, give me just the first 5 time frames but the whole YX plane for each frame. The pattern being, "we optimized all chunk reading, as long as the YX plane was requested".

This was a decision made because we personally hadn't encountered the need to read mosaic (or, "YX tile chunk") files. But we started getting a lot of requests to do so. So we came up with the mosaic API that I linked above where we create a dask array specific for mosaic "tile chunk" reading.

So long-story-short, for non-YX chunking, you use the normal api, for YX / "tile chunk" reading you will use the "mosaic" API.

By the way the Tiff stitching is relatively easy - I was able to quickly write a un-optimised class for multi-page/tile reading based on this: https://gist.github.com/rfezzani/b4b8852c5a48a901c1e94e09feb34743 -- of course it's not written for performance.

This is a big help. Thanks! I can probably try to add something for tiled reading for TIFFs this weekend or next week then. (The hard part will be making OmeTiffReader and TiffReader play nice with mosaic images).

Thank you for the example. In my case I needed to reduce the dimensions to this instead: reader.dask_data[0:300, 0:300, :].compute()

Yep sorry I should have clarified, that will only work if the Reader supports mosaic tile reading. Which TiffReader doesn't currently.

Can I get a link to an example file that would good for testing this development?

evamaxfield avatar Apr 08 '21 18:04 evamaxfield

Just a quick clarification with regards to the initial statement

Aperio svs files are internally multi-page multi-tile OME Tiff files.

Aperio SVS and OME-TIFF are completely separate file formats although both of them use TIFF as the underlying container format. Aperio SVS was introduced years before the multi-resolution extension to OME-TIFF. They use different storage mechanisms for the pyramidal levels as well as different metadata.

As mentioned above, tifffile allows to translates many TIFF-based file formats including Aperio SVS. At the level of aicsimageio, I would expect TiffReader rather than OMETiffReader to be the relevant reader when accessing this type of data.

sbesson avatar Apr 08 '21 18:04 sbesson

Thanks @sbesson thats a huge help.

I was particularly worried about how the OmeTiffReader would respond to a change like this because it inherits from TiffReader in our codebase.

evamaxfield avatar Apr 08 '21 18:04 evamaxfield

Hi @sbesson thank you for the info. Interestingly the svs files from this data-set do have OME XML meta-data. But I admit I don't know the history and precise differences of these formats. I'm indeed using TiffReader which conveniently appears to provide full support for this format.

By the way @JacksonMaxfield, I did not write the code in the link, credit goes to Riadh Fezzani.

Link to TCGA repository (LUAD specifically): https://portal.gdc.cancer.gov/projects/TCGA-LUAD (go to Files) Some more info: https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD

folterj avatar Apr 08 '21 19:04 folterj

Sorry for such a delay in getting back.

So @folterj we talked about this a bit and we think the best solution would be to create an SvsReader class that would be based off of TiffReader but would check for specific metadata. Our reasoning here is that to know how to stitch the tiles back together we need the metadata.

Anyway, I really haven't had any time and this will require a bit more work than originally expected so I am here to unfortunately say that this won't make it into 4.0 (at least on my own time).

evamaxfield avatar Apr 19 '21 21:04 evamaxfield

Hi @JacksonMaxfield Thanks for giving this more thought. Accessing the svs files with TiffReader is easy and doesn't require meta data other than provided by TiffReader, including the essential tags populated as properties. I've created a repo now which may be of use: https://github.com/folterj/TiffSlideReader

folterj avatar Apr 22 '21 16:04 folterj

Interesting.... Thanks for sharing.

I guess I should have clarified that while SVS may not require metadata, other "TiffLike" Formats may. In which case separating them into their own readers would be better. (This is just a problem of so many formats being similar but minorly different)

The decision was very much for the long term sanity / development of the whole package.

evamaxfield avatar Apr 22 '21 16:04 evamaxfield

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Mar 30 '23 01:03 github-actions[bot]