Tiler failing with use_largeimage=True
Hi everyone,
I have a dataset consisting of large WSIs in Hamamatsu .ndpi many of which are >5GB in size. I am able to initialise the slide using something like this:
test_img = Slide('/path/to/image.ndpi', processed_path='/path/to/output', use_largeimage=True)
And do basic tasks, such as:
from histolab.masks import TissueMask
all_tissue_mask = TissueMask()
test_img.locate_mask(all_tissue_mask)
This reads the file and generates a mask, outputting the result.
from histolab.tiler import GridTiler
gtiler = GridTiler(
tile_size=(224,224),
check_tissue=True,
tissue_percent=60,
pixel_overlap=0,
mpp=1.8
)
gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')
This fails, with the error:
histolab.exceptions.HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.
Which is the error if you try and load a large image without use_largeimage=True set. I would expect the uselarge_image flag to get passed into the tiler but this does not appear to be happening.
histolab v0.6
python 3.8
EDIT: typographical error
Hi @k-rakovic thank you for opening this issue!
The flag use_largeimage is to be passed to the Slide, which internally handles the backend that needs to be used to read such slide.
Anyway, I see that you're passing tile_img to gtiler.extract, and not test_img, is it intended?
Thanks for replying so soon. Sorry that was a typo on the post. I am passing test_img not tile_img to gtiler_extract but the use_largeimage flag appears not to be following it.
The code should read:
from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask
test_img = Slide('/path/to/image.ndpi', processed_path='/path/to/output', use_largeimage=True)
all_tissue_mask = TissueMask()
test_img.locate_mask(all_tissue_mask)
gtiler = GridTiler(
tile_size=(224,224),
check_tissue=True,
tissue_percent=60,
pixel_overlap=0,
mpp=1.8
)
gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')
The full error log is:
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
OpenSlideError Traceback (most recent call last)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736), in Slide._wsi(self)
735 try:
--> 736 slide = openslide.open_slide(self._path)
737 except PIL.UnidentifiedImageError:
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430), in open_slide(filename)
429 try:
--> 430 return OpenSlide(filename)
431 except OpenSlideUnsupportedFormatError:
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166), in OpenSlide.__init__(self, filename)
165 self._filename = filename
--> 166 self._osr = lowlevel.open(str(filename))
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199), in _check_open(result, _func, _args)
198 if err is not None:
--> 199 raise OpenSlideError(err)
200 return slide
OpenSlideError: Can't validate JPEG for directory 0: Expected marker at 4294972598, found none
During handling of the above exception, another exception occurred:
...
743 except Exception as other_error:
--> 744 raise HistolabException(other_error.__repr__() + f". {bad_format_error}")
745 return slide
HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.
Thank you, the error log is useful but I see it's only a partial stack trace (Output exceeds the size limit. Open the full output data in a text editor), could you post it whole?
Of course, this is the whole output:
---------------------------------------------------------------------------
OpenSlideError Traceback (most recent call last)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:736), in Slide._wsi(self)
735 try:
--> 736 slide = openslide.open_slide(self._path)
737 except PIL.UnidentifiedImageError:
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:430), in open_slide(filename)
429 try:
--> 430 return OpenSlide(filename)
431 except OpenSlideUnsupportedFormatError:
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/__init__.py:166), in OpenSlide.__init__(self, filename)
165 self._filename = filename
--> 166 self._osr = lowlevel.open(str(filename))
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/openslide/lowlevel.py:199), in _check_open(result, _func, _args)
198 if err is not None:
--> 199 raise OpenSlideError(err)
200 return slide
OpenSlideError: Can't validate JPEG for directory 0: Expected marker at 4294972598, found none
During handling of the above exception, another exception occurred:
HistolabException Traceback (most recent call last)
[/raid/users/kr151p/histolab.ipynb](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/histolab.ipynb) Cell 3 in ()
1 from histolab.tiler import GridTiler
3 gtiler = GridTiler(
4 tile_size=(224,224),
5 check_tissue=True,
(...)
8 mpp=1.8
9 )
---> 11 gtiler.extract(test_img, extraction_mask=all_tissue_mask, log_level='INFO')
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:384](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:384), in GridTiler.extract(self, slide, extraction_mask, log_level)
382 level = logging.getLevelName(log_level)
383 logger.setLevel(level)
--> 384 self._validate_level(slide)
385 self.tile_size = self._tile_size(slide)
386 self.pixel_overlap = int(self._scale_factor(slide) * self.pixel_overlap)
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:279](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/tiler.py:279), in Tiler._validate_level(self, slide)
266 def _validate_level(self, slide: Slide) -> None:
267 """Validate the Tiler's level according to the Slide.
268
269 Parameters
(...)
277 If the level is not available for the slide
278 """
--> 279 if len(slide.levels) - abs(self.level) < 0:
280 raise LevelError(
281 f"Level {self.level} not available. Number of available levels: "
282 f"{len(slide.levels)}"
283 )
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:359](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:359), in Slide.levels(self)
350 @lazyproperty
351 def levels(self) -> List[int]:
352 """Slide's available levels
353
354 Returns
(...)
357 The levels available
358 """
--> 359 return list(range(len(self._wsi.level_dimensions)))
File [~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:744](https://vscode-remote+ssh-002dremote-002bke-002ddgx-002edcs-002egla-002eac-002euk.vscode-resource.vscode-cdn.net/raid/users/kr151p/~/anaconda3/envs/histolab/lib/python3.8/site-packages/histolab/slide.py:744), in Slide._wsi(self)
740 raise FileNotFoundError(
741 f"The wsi path resource doesn't exist: {self._path}"
742 )
743 except Exception as other_error:
--> 744 raise HistolabException(other_error.__repr__() + f". {bad_format_error}")
745 return slide
HistolabException: OpenSlideError("Can't validate JPEG for directory 0: Expected marker at 4294972598, found none"). This slide may be corrupted or have a non-standard format not handled by the openslide and PIL libraries. Consider setting use_largeimage to True when instantiating this Slide.
Hi @k-rakovic is that .ndpi you're using available somewhere on the internet or is a private/legacy wsi?
Hi @k-rakovic is that
.ndpiyou're using available somewhere on the internet or is a private/legacy wsi?
It is unfortunately part of a private dataset so I can't share it. I can view the image in something like QuPath so I know the image file itself is not corrupt.
Ok so actually @k-rakovic you found a bug 🥇
Turns out that Slide.levels called by Tiler._validate_level(slide) does not care about the use_largeimage flag.
We did not realize this because the tests that have use_largeimage=True use the CMU_1_SMALL_REGION which is readable by openslide. We should then use another slide not compatible with openslide to test and fix this.
@alessiamarcolini As I said above, I unfortunately can't share any images but I'm happy to help if I can (please note though I'm a pathologist rather than a developer...!). It seems existing image tiling methods which support large ndpi images are thin on the ground so it would be awesome if yours could work!