spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

relabel block

Open ArneDefauw opened this issue 1 year ago • 5 comments

Follow up PR for https://github.com/scverse/spatialdata/pull/588, see discussion https://github.com/scverse/spatialdata/issues/597 for relabeling in case output data is a labels layer (i.e. dims does not contain c) via use of a bit shift.

ArneDefauw avatar Aug 07 '24 13:08 ArneDefauw

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.83%. Comparing base (8c12732) to head (4f06d59). Report is 67 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/operations/map.py 97.61% 1 Missing :warning:
src/spatialdata/_types.py 50.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
+ Coverage   91.80%   91.83%   +0.02%     
==========================================
  Files          45       45              
  Lines        6959     7002      +43     
==========================================
+ Hits         6389     6430      +41     
- Misses        570      572       +2     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 96.29% <100.00%> (ø)
src/spatialdata/_core/operations/map.py 97.67% <97.61%> (-0.11%) :arrow_down:
src/spatialdata/_types.py 69.23% <50.00%> (-3.50%) :arrow_down:

codecov[bot] avatar Aug 07 '24 13:08 codecov[bot]

Now also added helper function _relabel_sequential, see https://github.com/ArneDefauw/spatialdata/blob/633a132ddfb6353925bc099114b893df43ad3f2e/src/spatialdata/_core/operations/map.py#L211, as discussed here https://github.com/scverse/spatialdata/issues/597#issuecomment-2225678023

I think the PR is ready for review now.

ArneDefauw avatar Aug 08 '24 09:08 ArneDefauw

hi @ArneDefauw , thanks a lot, super useful. I went back to this function in squidpy, and remember that we also did an additional step to harmonize the labels with dask image

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

any reason for leaving this out? Otherwise for me it is good to go!

Hi @giovp ,

thanks for the review!

Yes, I did not include the step to harmonize the labels across chunks due to some issues I observed when cell density is high, see also my previous comments. https://github.com/scverse/spatialdata/issues/597#issuecomment-2213297632 https://github.com/scverse/spatialdata/issues/597#issuecomment-2224762535

I make it a little bit more concrete with an example:

So this is map_raster on a labels layer without harmonizing labels:

#I use the harpy package (https://github.com/saeyslab/harpy) for downloading of some test data
from sparrow.datasets import mibi_example
sdata = mibi_example()
def copy_labels( arr ):
    return arr

results=map_raster( sdata[ "masks_whole" ].chunk( 212 ), func=copy_labels, blockwise=True, relabel=True, )
from spatialdata import read_zarr

sdata[ "masks_whole_copied_relabeled_no_merging" ] = results
sdata.write_element( "masks_whole_copied_relabeled_no_merging" )
sdata = read_zarr( sdata.path )

Then without using the harmonizing of labels I get: Screenshot 2024-08-26 at 11 35 57

and with harmonizing of the labels using

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

I get:

Screenshot 2024-08-26 at 11 30 33

Which is not great, cells on the boundary of the chunks are merged together.

A solution for this, is using this approach: https://github.com/gletort/NeubiasPasteur2023_AdvancedCellPose/blob/main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z for merging chunks back together. Also implemented here https://github.com/saeyslab/harpy/blob/e4966aaf12ff0c36e301ff27df675c2c96f87ed2/src/sparrow/image/segmentation/_segmentation.py#L494, which fixes much of these issues.

But I do not know if you want this to be included in this PR, see also comment of @LucaMarconato https://github.com/scverse/spatialdata/issues/597#issuecomment-2225678023. I would be happy to prepare a follow up PR.

ArneDefauw avatar Aug 26 '24 09:08 ArneDefauw

hi @ArneDefauw , sorry for late reply.

I took a look at

gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z

it's a nice approach I think, and I wonder if it should be included already here. Meaning, I am not sure how useful it is the output of relabelling without the across-block label harmonization. Yes, relabelling fix one step of the chunked segmentation, but the result is not really actionable/useful right? Or am I missing something?

also @LucaMarconato wdyt? you can take a look at the output of the excellent post from @ArneDefauw before to understand visually what is the problem

giovp avatar Sep 02 '24 16:09 giovp

hi @ArneDefauw , sorry for late reply.

I took a look at

gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z

it's a nice approach I think, and I wonder if it should be included already here. Meaning, I am not sure how useful it is the output of relabelling without the across-block label harmonization. Yes, relabelling fix one step of the chunked segmentation, but the result is not really actionable/useful right? Or am I missing something?

also @LucaMarconato wdyt? you can take a look at the output of the excellent post from @ArneDefauw before to understand visually what is the problem

Yes, indeed, if you do not know how to do the across-block label harmonization, the result will not be very useful. On the other hand, there are probably many ways to do it. Maybe spatialdata should at least implement one method to do this label harmonization. But I think it is up to you guys to decide :)

ArneDefauw avatar Sep 17 '24 15:09 ArneDefauw

@ArneDefauw thanks for the PR and apologies for taking some time to get to it. After discussing with @LucaMarconato and @berombau we decided to put the label block harmonization in squidpy as this library is for having user friendly operations like this. We do not necessarily want the user to have to think about blocks and the implications of that, rather the average user just wants a function relabel.

melonora avatar Nov 27 '24 10:11 melonora

@ArneDefauw thanks for the PR and apologies for taking some time to get to it. After discussing with @LucaMarconato and @berombau we decided to put the label block harmonization in squidpy as this library is for having user friendly operations like this. We do not necessarily want the user to have to think about blocks and the implications of that, rather the average user just wants a function relabel.

Hi @melonora , that makes sense, thanks for having a look at the PR!

ArneDefauw avatar Nov 27 '24 12:11 ArneDefauw

I adjusted the error message a tiny bit. While the function is private, it is called by public API. A less experienced programmer might not know about bit shift.

melonora avatar Nov 27 '24 13:11 melonora

Also made relabel_sequential public given that @ArneDefauw put it as suggestion in the error message which in itself is in a private function, but the error message can be shown by calling a public function.

melonora avatar Nov 27 '24 13:11 melonora

Thanks @ArneDefauw for the PR and @melonora for addressing the review comments. We are good to merge!

LucaMarconato avatar Nov 27 '24 17:11 LucaMarconato