histolab icon indicating copy to clipboard operation
histolab copied to clipboard

Tiler failing with use_largeimage=True

Open K-Rakovic opened this issue 2 years ago • 8 comments

Hi everyone,

I have a dataset consisting of large WSIs in Hamamatsu .ndpi many of which are >5GB in size. I am able to initialise the slide using something like this:

test_img = Slide('/path/to/image.ndpi', processed_path='/path/to/output', use_largeimage=True)

And do basic tasks, such as:

from histolab.masks import TissueMask

all_tissue_mask = TissueMask()
test_img.locate_mask(all_tissue_mask)

This reads the file and generates a mask, outputting the result.

from histolab.tiler import GridTiler

gtiler = GridTiler(
    tile_size=(224,224),
    check_tissue=True,
    tissue_percent=60,
    pixel_overlap=0,
    mpp=1.8
)

gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')

This fails, with the error:

histolab.exceptions.HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.

Which is the error if you try and load a large image without use_largeimage=True set. I would expect the uselarge_image flag to get passed into the tiler but this does not appear to be happening.

histolab v0.6
python 3.8

EDIT: typographical error

K-Rakovic avatar May 31 '23 12:05 K-Rakovic

Hi @k-rakovic thank you for opening this issue!

The flag use_largeimage is to be passed to the Slide, which internally handles the backend that needs to be used to read such slide.

Anyway, I see that you're passing tile_img to gtiler.extract, and not test_img, is it intended?

alessiamarcolini avatar May 31 '23 13:05 alessiamarcolini

Thanks for replying so soon. Sorry that was a typo on the post. I am passing test_img not tile_img to gtiler_extract but the use_largeimage flag appears not to be following it.

The code should read:

from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask

test_img = Slide('/path/to/image.ndpi', processed_path='/path/to/output', use_largeimage=True)

all_tissue_mask = TissueMask()
test_img.locate_mask(all_tissue_mask)

gtiler = GridTiler(
    tile_size=(224,224),
    check_tissue=True,
    tissue_percent=60,
    pixel_overlap=0,
    mpp=1.8
)

gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')

The full error log is:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
OpenSlideError                            Traceback (most recent call last)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736), in Slide._wsi(self)
    735 try:
--> 736     slide = openslide.open_slide(self._path)
    737 except PIL.UnidentifiedImageError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430), in open_slide(filename)
    429 try:
--> 430     return OpenSlide(filename)
    431 except OpenSlideUnsupportedFormatError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166), in OpenSlide.__init__(self, filename)
    165 self._filename = filename
--> 166 self._osr = lowlevel.open(str(filename))

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199), in _check_open(result, _func, _args)
    198 if err is not None:
--> 199     raise OpenSlideError(err)
    200 return slide

OpenSlideError: Can't validate JPEG for directory 0: Expected marker at 4294972598, found none

During handling of the above exception, another exception occurred:
...
    743 except Exception as other_error:
--> 744     raise HistolabException(other_error.__repr__() + f". {bad_format_error}")
    745 return slide

HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.

K-Rakovic avatar May 31 '23 13:05 K-Rakovic

Thank you, the error log is useful but I see it's only a partial stack trace (Output exceeds the size limit. Open the full output data in a text editor), could you post it whole?

alessiamarcolini avatar May 31 '23 13:05 alessiamarcolini

Of course, this is the whole output:

---------------------------------------------------------------------------
OpenSlideError                            Traceback (most recent call last)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736), in Slide._wsi(self)
    735 try:
--> 736     slide = openslide.open_slide(self._path)
    737 except PIL.UnidentifiedImageError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430), in open_slide(filename)
    429 try:
--> 430     return OpenSlide(filename)
    431 except OpenSlideUnsupportedFormatError:

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166), in OpenSlide.__init__(self, filename)
    165 self._filename = filename
--> 166 self._osr = lowlevel.open(str(filename))

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199), in _check_open(result, _func, _args)
    198 if err is not None:
--> 199     raise OpenSlideError(err)
    200 return slide

OpenSlideError: Can't validate JPEG for directory 0: Expected marker at 4294972598, found none

During handling of the above exception, another exception occurred:

HistolabException                         Traceback (most recent call last)
[/raid/users/kr151p/histolab.ipynb](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/histolab.ipynb) Cell 3 in ()
      1 from histolab.tiler import GridTiler
      3 gtiler = GridTiler(
      4     tile_size=(224,224),
      5     check_tissue=True,
   (...)
      8     mpp=1.8
      9 )
---> 11 gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:384](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:384), in GridTiler.extract(self, slide, extraction_mask, log_level)
    382 level = logging.getLevelName(log_level)
    383 logger.setLevel(level)
--> 384 self._validate_level(slide)
    385 self.tile_size = self._tile_size(slide)
    386 self.pixel_overlap = int(self._scale_factor(slide) * self.pixel_overlap)

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:279](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:279), in Tiler._validate_level(self, slide)
    266 def _validate_level(self, slide: Slide) -> None:
    267     """Validate the Tiler's level according to the Slide.
    268 
    269     Parameters
   (...)
    277         If the level is not available for the slide
    278     """
--> 279     if len(slide.levels) - abs(self.level) < 0:
    280         raise LevelError(
    281             f"Level {self.level} not available. Number of available levels: "
    282             f"{len(slide.levels)}"
    283         )

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:359](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:359), in Slide.levels(self)
    350 @lazyproperty
    351 def levels(self) -> List[int]:
    352     """Slide's available levels
    353 
    354     Returns
   (...)
    357         The levels available
    358     """
--> 359     return list(range(len(self._wsi.level_dimensions)))

File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:744](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:744), in Slide._wsi(self)
    740     raise FileNotFoundError(
    741         f"The wsi path resource doesn't exist: {self._path}"
    742     )
    743 except Exception as other_error:
--> 744     raise HistolabException(other_error.__repr__() + f". {bad_format_error}")
    745 return slide

HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.

K-Rakovic avatar May 31 '23 13:05 K-Rakovic

Hi @k-rakovic is that .ndpi you're using available somewhere on the internet or is a private/legacy wsi?

ernestoarbitrio avatar May 31 '23 13:05 ernestoarbitrio

Hi @k-rakovic is that .ndpi you're using available somewhere on the internet or is a private/legacy wsi?

It is unfortunately part of a private dataset so I can't share it. I can view the image in something like QuPath so I know the image file itself is not corrupt.

K-Rakovic avatar May 31 '23 13:05 K-Rakovic

Ok so actually @k-rakovic you found a bug 🥇

Turns out that Slide.levels called by Tiler._validate_level(slide) does not care about the use_largeimage flag. We did not realize this because the tests that have use_largeimage=True use the CMU_1_SMALL_REGION which is readable by openslide. We should then use another slide not compatible with openslide to test and fix this.

alessiamarcolini avatar May 31 '23 14:05 alessiamarcolini

@alessiamarcolini As I said above, I unfortunately can't share any images but I'm happy to help if I can (please note though I'm a pathologist rather than a developer...!). It seems existing image tiling methods which support large ndpi images are thin on the ground so it would be awesome if yours could work!

K-Rakovic avatar May 31 '23 17:05 K-Rakovic