tiatoolbox
tiatoolbox copied to clipboard
Add support for `ignore_is_tiled_tiff` in `WSIReader.open()`
A possible solution for the issue https://github.com/TissueImageAnalytics/tiatoolbox/issues/804
Modification: Added a flag ignore_is_tiled_tiff
to the open()
method of WSIReader
from tiatoolbox/wsicore/wsireader.py
Usecase: if you know that openslide worked well on your ".tif" or ".tiff" file, but your file does not pass the is_tiled_tiff()
check, you might want to ignore this check to be able to used use the functionality of tiatoolbox
that openslide does not provide.
Ruff: To satisfy ruff checks, FBT001 and FBT002, I made ignore_is_tiled_tiff
a keyword-only argument as suggested here: https://docs.astral.sh/ruff/rules/boolean-type-hint-positional-argument/
Tests: I am unsure how to write new tests - will need an example of such a ".tif" file, for which it's ok to ignore the is_tiled_tiff()
check. I ran pytest tests/test_wsireader.py
- all tests passed.
Thanks @GeorgeBatch. That's great. If you can provide us with a sample tif then we can upload this to our servers and set up automated tests. We can also investigate if there is another work around otherwise we can add tests and accept this PR.
Please can you provide a onedrive/dropbox/gdrive link to sample tif file and send it to [email protected] so we can upload to tiatest server. Please make sure that the test file is shared under BSD-3 license or an open license otherwise we will not be able to upload the file.
If you can find a file which is small in size that would be useful as it would mean tests run fast on GitHub actions.
@shaneahmed, I will try to get you access to one of our images as soon as possible. They come from an ongoing study so I might need to request a small dummy image scanned with the same scanner.
@shaneahmed, I will try to get you access to one of our images as soon as possible. They come from an ongoing study so I might need to request a small dummy image scanned with the same scanner.
A dummy slide would be great.
Updated branch with develop to fix integration with codecov.
Codecov Report
Attention: Patch coverage is 75.00000%
with 2 lines
in your changes are missing coverage. Please review.
Project coverage is 99.87%. Comparing base (
ec5a373
) to head (5b488c7
).
Files | Patch % | Lines |
---|---|---|
tiatoolbox/wsicore/wsireader.py | 75.00% | 0 Missing and 2 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #807 +/- ##
===========================================
- Coverage 99.89% 99.87% -0.03%
===========================================
Files 69 69
Lines 8574 8577 +3
Branches 1640 1642 +2
===========================================
+ Hits 8565 8566 +1
Misses 1 1
- Partials 8 10 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@shaneahmed, I will share a dummy slide with you as soon as I have it. Hopefully, this week.
@shaneahmed, I shared some samples through OneDrive with [email protected]. I made slides myself using flower petals so they can be shared without restrictions.
Should the argument ignore_is_tiled_tiff
be also part of the __init__()
method of the WSIReader
class?
Should the argument
ignore_is_tiled_tiff
be also part of the__init__()
method of theWSIReader
class?
The best way would be automatically identify metadata from the file and then read. If that's not feasible then we can consider adding ignore_is_tiled_tiff
as keyword argument. I have requested access to the files by email.
The test file is uploaded to https://tiatoolbox.dcs.warwick.ac.uk/sample_wsis/ventana_sample.tif
You can update https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/tiatoolbox/data/remote_samples.yaml
to add this for tests.
@shaneahmed I added the file to the registry. I do not think Ventana scanners always output .tif
files. I think it was just set up like that for us. So maybe it would be good to name it sample-ventana-tif
I have 2 questions about this code https://github.com/TissueImageAnalytics/tiatoolbox/blob/ec5a373054eec7ffef5b93e37c79d8716ccef92b/tiatoolbox/wsicore/wsireader.py#L300-L307
- What's the reason for not including
.tif
as a suffix to be processed withTIFFWSIReader
on line 300? - Why is
OpenSlideWSIReader
used as default instead ofTIFFWSIReader
, which is only called on ifOpenSlideWSIReader
fails?
I am unsure which of the cases to use if the flag ignore_is_tiled_tiff
is not desirable.
“vendor” field in info is empty, so can’t use it for deciding which backend to use for WSIReader.
I tried adding the new slide to the fixture list with OpenSlideWSIreader
as the expected reader:
...
{
"reader_class": OpenSlideWSIReader,
"sample_key": "ventana-tif",
"kwargs": {},
},
...
ids=[
...
"OpenSlideReader (Ventana non-tiled tif)",
...
https://github.com/TissueImageAnalytics/tiatoolbox/blob/ec5a373054eec7ffef5b93e37c79d8716ccef92b/tests/test_wsireader.py#L2512-L2573
But the test_base_open()
test fails with
======================================================================================================================================== FAILURES ========================================================================================================================================
________________________________________________________________________________________________________________ test_base_open[OpenSlideReader (Ventana non-tiled tif)] _________________________________________________________________________________________________________________
wsi = <tiatoolbox.wsicore.wsireader.OpenSlideWSIReader object at 0x16e0c7a90>
def test_base_open(wsi: WSIReader) -> None:
"""Checks that WSIReader.open detects the type correctly."""
new_wsi = WSIReader.open(wsi.input_path)
> assert type(new_wsi) is type(wsi)
E AssertionError: assert <class 'tiatoolbox.wsicore.wsireader.VirtualWSIReader'> is <class 'tiatoolbox.wsicore.wsireader.OpenSlideWSIReader'>
E + where <class 'tiatoolbox.wsicore.wsireader.VirtualWSIReader'> = type(<tiatoolbox.wsicore.wsireader.VirtualWSIReader object at 0x3189f3050>)
E + and <class 'tiatoolbox.wsicore.wsireader.OpenSlideWSIReader'> = type(<tiatoolbox.wsicore.wsireader.OpenSlideWSIReader object at 0x16e0c7a90>)
tests/test_wsireader.py:2611: AssertionError
because, unlike in test_wsireader_open()
, I do not know how to pass the ignore_is_tiled_tiff
argument within the pytest fixture (and I am not sure it should be passed), so the is_tiled_tiff
check is performed, fails, and VirtualWSIReader is used.
wsi = WSIReader.open(sample_ventana_tif, ignore_is_tiled_tiff=True)
assert isinstance(wsi, wsireader.OpenSlideWSIReader)
@shaneahmed I added the file to the registry. I do not think Ventana scanners always output
.tif
files. I think it was just set up like that for us. So maybe it would be good to name itsample-ventana-tif
I have 2 questions about this code
https://github.com/TissueImageAnalytics/tiatoolbox/blob/ec5a373054eec7ffef5b93e37c79d8716ccef92b/tiatoolbox/wsicore/wsireader.py#L300-L307
- What's the reason for not including
.tif
as a suffix to be processed withTIFFWSIReader
on line 300?- Why is
OpenSlideWSIReader
used as default instead ofTIFFWSIReader
, which is only called on ifOpenSlideWSIReader
fails?I am unsure which of the cases to use if the flag
ignore_is_tiled_tiff
is not desirable.“vendor” field in info is empty, so can’t use it for deciding which backend to use for WSIReader.
The answer to both your questions is the same. When we wrote the code tifffile
had not good support for WSIs. It still has very limited documentation. OpenSlide is preferred library to read WSIs by researchers. So most researchers will find it easy to port their code to tiatoolbox and integrate into their pipelines. If OpenSlide fails then the fallback option is tifffile.
I tried adding the new slide to the fixture list with
OpenSlideWSIreader
as the expected reader:... { "reader_class": OpenSlideWSIReader, "sample_key": "ventana-tif", "kwargs": {}, }, ... ids=[ ... "OpenSlideReader (Ventana non-tiled tif)", ...
https://github.com/TissueImageAnalytics/tiatoolbox/blob/ec5a373054eec7ffef5b93e37c79d8716ccef92b/tests/test_wsireader.py#L2512-L2573
But the
test_base_open()
test fails with======================================================================================================================================== FAILURES ======================================================================================================================================== ________________________________________________________________________________________________________________ test_base_open[OpenSlideReader (Ventana non-tiled tif)] _________________________________________________________________________________________________________________ wsi = <tiatoolbox.wsicore.wsireader.OpenSlideWSIReader object at 0x16e0c7a90> def test_base_open(wsi: WSIReader) -> None: """Checks that WSIReader.open detects the type correctly.""" new_wsi = WSIReader.open(wsi.input_path) > assert type(new_wsi) is type(wsi) E AssertionError: assert <class 'tiatoolbox.wsicore.wsireader.VirtualWSIReader'> is <class 'tiatoolbox.wsicore.wsireader.OpenSlideWSIReader'> E + where <class 'tiatoolbox.wsicore.wsireader.VirtualWSIReader'> = type(<tiatoolbox.wsicore.wsireader.VirtualWSIReader object at 0x3189f3050>) E + and <class 'tiatoolbox.wsicore.wsireader.OpenSlideWSIReader'> = type(<tiatoolbox.wsicore.wsireader.OpenSlideWSIReader object at 0x16e0c7a90>) tests/test_wsireader.py:2611: AssertionError
because, unlike in
test_wsireader_open()
, I do not know how to pass theignore_is_tiled_tiff
argument within the pytest fixture (and I am not sure it should be passed), so theis_tiled_tiff
check is performed, fails, and VirtualWSIReader is used.wsi = WSIReader.open(sample_ventana_tif, ignore_is_tiled_tiff=True) assert isinstance(wsi, wsireader.OpenSlideWSIReader)
After a bit of play around with your image. I think I have a more minimalistic solution which will not change the API. If you make the following change in the code, then we do not need a flag ignore_is_tiled_tiff
.
If you change the line 309 with this code, tiatoolbox should work on ventanna scanners.
if last_suffix in (".tif", ".tiff"):
if openslide.OpenSlide.detect_format(input_path) is not None:
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
If you still want to force TIFFWSIReader then there are two ways:
- Use TIFFWSIReader() to open the image.
- We change the API to allow specifying which class to open the file. But it would be redundant considering we can open an image directly by specifying a class.
Some tests may fail after this change but we can fix those easily, I think.
Just replacing the old code with your proposed code results in the following error:
========================================================================================== FAILURES ===========================================================================================
__________________________________________________________________________________ test_tiled_tiff_tifffile ___________________________________________________________________________________
remote_sample = <function remote_sample.<locals>.__remote_sample at 0x30d98fce0>
def test_tiled_tiff_tifffile(remote_sample: Callable) -> None:
"""Test fallback to tifffile for files which openslide cannot read.
E.G. tiled tiffs with JPEG XL compression.
"""
sample_path = remote_sample("tiled-tiff-1-small-jp2k")
> wsi = wsireader.WSIReader.open(sample_path)
tests/test_wsireader.py:2005:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tiatoolbox/wsicore/wsireader.py:305: in open
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
tiatoolbox/wsicore/wsireader.py:1663: in __init__
self.openslide_wsi = openslide.OpenSlide(filename=str(self.input_path))
/Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/openslide/__init__.py:179: in __init__
self._osr = lowlevel.open(str(filename))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
result = 5811949952, _func = <_FuncPtr object at 0x30c3041d0>
_args = ('/private/var/folders/lz/602r_tbj50jf1yw4jnmxbg9r0000gn/T/pytest-of-gbatch/pytest-17/data0/CMU-1-Small-Region.jp2k.tiff',)
def _check_open(result, _func, _args):
if result is None:
raise OpenSlideUnsupportedFormatError("Unsupported or missing image file")
slide = _OpenSlide(c_void_p(result))
err = get_error(slide)
if err is not None:
> raise OpenSlideError(err)
E openslide.lowlevel.OpenSlideError: Unsupported TIFF compression: 33004
/Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/openslide/lowlevel.py:207: OpenSlideError
====================================================================================== warnings summary =======================================================================================
../../../../../opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/_pytest/config/__init__.py:1439
/Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/_pytest/config/__init__.py:1439: PytestConfigWarning: Unknown config option: collect_ignore
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
tests/test_wsireader.py::test_convert_resolution_units
/Users/gbatch/Developer/projects/current/comp-path/tiatoolbox/tiatoolbox/wsicore/wsireader.py:888: RuntimeWarning: divide by zero encountered in scalar divide
output_dict["baseline"] = baseline_mpp[0] / output_dict["mpp"][0]
tests/test_wsireader.py::test_tissue_mask_otsu
/Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/defusedxml/__init__.py:30: DeprecationWarning: defusedxml.cElementTree is deprecated, import from defusedxml.ElementTree instead.
from . import cElementTree
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================================== short test summary info ===================================================================================
FAILED tests/test_wsireader.py::test_tiled_tiff_tifffile - openslide.lowlevel.OpenSlideError: Unsupported TIFF compression: 33004
The image 'CMU-1-Small-Region.jp2k.tiff' passes the openslide.OpenSlide.detect_format(input_path) check, but openslide fails to open it.
if openslide.OpenSlide.detect_format(input_path) is not None:
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
To mitigate this, a try-except block can be used:
if last_suffix in (".tif", ".tiff"):
if openslide.OpenSlide.detect_format(input_path) is not None:
try:
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
except openslide.OpenSlideError:
pass
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
To preserve as much of the original behaviour as possible, these lines need to be outside of the except block.
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
All tests pass with this change. But this change resulted in some failed ruff checks - the WSIReader.open()
becomes too complex. So I used git commit --no-verify
so you can see the changes.
tiatoolbox/wsicore/wsireader.py:231:9: PLR0912 Too many branches (14 > 12)
tiatoolbox/wsicore/wsireader.py:231:9: C901 `open` is too complex (15 > 14)
Found 2 errors.
Should I add PLR0912 and C901 to the noqa of open? I know it's not a good strategy to ignore linters whenever they complain. Now:
def open( # noqa: PLR0911
Can be:
def open( # noqa: PLR0911, PLR0912, C901
All tests pass with this change. But this change resulted in some failed ruff checks - the
WSIReader.open()
becomes too complex. So I usedgit commit --no-verify
so you can see the changes.tiatoolbox/wsicore/wsireader.py:231:9: PLR0912 Too many branches (14 > 12) tiatoolbox/wsicore/wsireader.py:231:9: C901 `open` is too complex (15 > 14) Found 2 errors.
Should I add PLR0912 and C901 to the noqa of open? I know it's not a good strategy to ignore linters whenever they complain. Now:
def open( # noqa: PLR0911
Can be:
def open( # noqa: PLR0911, PLR0912, C901
Please go ahead and add these. I can fix these later in a separate PR.
Just replacing the old code with your proposed code results in the following error:
========================================================================================== FAILURES =========================================================================================== __________________________________________________________________________________ test_tiled_tiff_tifffile ___________________________________________________________________________________ remote_sample = <function remote_sample.<locals>.__remote_sample at 0x30d98fce0> def test_tiled_tiff_tifffile(remote_sample: Callable) -> None: """Test fallback to tifffile for files which openslide cannot read. E.G. tiled tiffs with JPEG XL compression. """ sample_path = remote_sample("tiled-tiff-1-small-jp2k") > wsi = wsireader.WSIReader.open(sample_path) tests/test_wsireader.py:2005: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tiatoolbox/wsicore/wsireader.py:305: in open return OpenSlideWSIReader(input_path, mpp=mpp, power=power) tiatoolbox/wsicore/wsireader.py:1663: in __init__ self.openslide_wsi = openslide.OpenSlide(filename=str(self.input_path)) /Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/openslide/__init__.py:179: in __init__ self._osr = lowlevel.open(str(filename)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ result = 5811949952, _func = <_FuncPtr object at 0x30c3041d0> _args = ('/private/var/folders/lz/602r_tbj50jf1yw4jnmxbg9r0000gn/T/pytest-of-gbatch/pytest-17/data0/CMU-1-Small-Region.jp2k.tiff',) def _check_open(result, _func, _args): if result is None: raise OpenSlideUnsupportedFormatError("Unsupported or missing image file") slide = _OpenSlide(c_void_p(result)) err = get_error(slide) if err is not None: > raise OpenSlideError(err) E openslide.lowlevel.OpenSlideError: Unsupported TIFF compression: 33004 /Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/openslide/lowlevel.py:207: OpenSlideError ====================================================================================== warnings summary ======================================================================================= ../../../../../opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/_pytest/config/__init__.py:1439 /Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/_pytest/config/__init__.py:1439: PytestConfigWarning: Unknown config option: collect_ignore self._warn_or_fail_if_strict(f"Unknown config option: {key}\n") tests/test_wsireader.py::test_convert_resolution_units /Users/gbatch/Developer/projects/current/comp-path/tiatoolbox/tiatoolbox/wsicore/wsireader.py:888: RuntimeWarning: divide by zero encountered in scalar divide output_dict["baseline"] = baseline_mpp[0] / output_dict["mpp"][0] tests/test_wsireader.py::test_tissue_mask_otsu /Users/gbatch/opt/miniconda3/envs/tiatoolbox-dev/lib/python3.11/site-packages/defusedxml/__init__.py:30: DeprecationWarning: defusedxml.cElementTree is deprecated, import from defusedxml.ElementTree instead. from . import cElementTree -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =================================================================================== short test summary info =================================================================================== FAILED tests/test_wsireader.py::test_tiled_tiff_tifffile - openslide.lowlevel.OpenSlideError: Unsupported TIFF compression: 33004
The image 'CMU-1-Small-Region.jp2k.tiff' passes the openslide.OpenSlide.detect_format(input_path) check, but openslide fails to open it.
if openslide.OpenSlide.detect_format(input_path) is not None: return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
To mitigate this, a try-except block can be used:
if last_suffix in (".tif", ".tiff"): if openslide.OpenSlide.detect_format(input_path) is not None: try: return OpenSlideWSIReader(input_path, mpp=mpp, power=power) except openslide.OpenSlideError: pass if is_tiled_tiff(input_path): return TIFFWSIReader(input_path, mpp=mpp, power=power)
To preserve as much of the original behaviour as possible, these lines need to be outside of the except block.
if is_tiled_tiff(input_path): return TIFFWSIReader(input_path, mpp=mpp, power=power)
I am OK with this solution. However, I think we should not pass
if last_suffix in (".tif", ".tiff"):
if openslide.OpenSlide.detect_format(input_path) is not None:
try:
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
except openslide.OpenSlideError:
pass
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
if last_suffix in (".tif", ".tiff"):
if openslide.OpenSlide.detect_format(input_path) is not None:
try:
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
except openslide.OpenSlideError:
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
I considered this option. My argument against was that it will make the logic different from what it was.
Before, to be opened with TIFFWSIReader
, the file needed to satisfy
-
last_suffix in (".tif", ".tiff")
-
is_tiled_tiff(input_path)
- OpenSlideWSIReader failed with openslide.OpenSlide error
This behaviour is preserved in the current option with pass
.
If we put
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
into the except
statement instead of the pass
, to be opened with TIFFWSIReader
, the file will still need to satisfy the 3 criteria as before, but in addition will need to satisfy openslide.OpenSlide.detect_format(input_path) is not None
.
So, for a slide that has a non-detectable format with OpenSlide (openslide.OpenSlide.detect_format(input_path)
), we will not even try to open it with TIFFWSIReader
.
Why is this check needed before trying OpenSlide?
if openslide.OpenSlide.detect_format(input_path) is not None:
What do you think about the option below? This option leaves the same 3 requirements for using TIFFWSIReader
as in the current version on the develop
branch. Trying to open with OpenSlideWSIReader
becomes easier since the file does not need to satisfy is_tiled_tiff(input_path)
.
if last_suffix in (".tif", ".tiff"):
try:
return OpenSlideWSIReader(input_path, mpp=mpp, power=power)
except openslide.OpenSlideError:
if is_tiled_tiff(input_path):
return TIFFWSIReader(input_path, mpp=mpp, power=power)
This version passes all the existing tests.
if openslide.OpenSlide.detect_format(input_path) is not None:
OpenSlide will still be able to open regular tiff which is not a WSI. In this case, metadata will be missing and we may get unintentional errors which are not traceable. Regular tiff file should be opened by VirtualWSIReader which can deal with regular images.