spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Apply function

Open ArneDefauw opened this issue 2 years ago • 1 comments

Implementation of an apply function for image layers in spatialdata object.

Basic usage is similar to squidpy’s apply function,

E.g.:

def _multiply(image, parameter=2):
    return image * parameter

fn_kwargs = {"parameter": 4}
sdata_blobs = sdata_blobs.apply(
        func=_multiply,
        fn_kwargs=fn_kwargs,
        img_layer="blobs_image",
        output_layer="blobs_apply",
    )

There is the option to process different channels and/or z stacks with different callable/parameters, e.g. via fn_kwargs=( 0: ”parameter”: 4, 1: “parameter”: 5 ), then channels 0 and 1 will be processed with different parameters. If channels/z-stacks should be processed with different parameters, we require setting combine_c/combine_z to False.

On the other hand, setting specific callable/parameters for different channels, e.g. fn_kwargs=( 0: ”parameter”: 4, 1: “parameter”: 5 ) is not allowed when setting combine_c to True, same for z-stacks.

I included this combine_c/combine_z flag in order to be clear about what is the required input/output dimension for the Callable.

E.g. for 2D, If combine_c==True —> input is c,y,x, else it is 1,y,x. (I am still not sure if it is clearer for the user if we squeeze the c dimension in the latter case)

To the best of my knowledge, in squidpy it was not possible to specify different callable/parameter for different channels, although a single channel could be specified.

In the docstring I included some more examples, I also refer to the unit tests.

Another parameter that was added to the apply function is the output_chunks parameter. This parameter is passed as chunks to the map_overlap/map_blocks (if chunksparameter passed to apply is not None). This allows the user to alter the output shape of the arrays (z,y,x). z dimension can only be altered if combine_z is set to True, which makes sense to me. Channel dimension can not be changed at the moment. I am unsure if we should support this.

For multiscale images, there is still an issue. Although the apply function works in this case, it will redo the calculation for each scale. I think this recalculation is triggered by

spatialdata.models.Image2DModel.parse(
            arr,
            dims=se.dims,
            scale_factors=scale_factors,
            chunks=arr.chunksize,
            c_coords=channel,
        )

For spatialdata objects that are not backed by a zarr store, a solution is to do arr.persist() before adding it as an image layer to the spatialdata object. For backed sdata, a solution is to do the calculation on scale0, write to an intermediate image layer, an then add it as multiscale. But I am unsure If we should try to fix this in the apply function.

Currently only image layer to image layer is supported.

I think image-layer -> labels-layer is basically segmentation. And this requires some more complex steps, such as a bit shift such that labels in each chunks are unique + clever glueing the labels from each chunk back together - a simple map_overlap is not sufficient.

Labels-layer -> labels-layer is basically the same story. Whatever non-trivial processing you do, you need more than a map_overlap/map_blocks. But maybe I have overlooked something, and there is a use case here.

I have added some unit tests. However, I did not test the case where there is a z-dimension, because I could not find an sdata with c,z,y,x, image layers in the pytest fixtures.

ArneDefauw avatar Nov 24 '23 15:11 ArneDefauw

Codecov Report

Attention: Patch coverage is 74.67249% with 58 lines in your changes missing coverage. Please review.

Project coverage is 90.82%. Comparing base (bf3377f) to head (4df226c). Report is 171 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/operations/apply.py 73.63% 58 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #407      +/-   ##
==========================================
- Coverage   91.57%   90.82%   -0.76%     
==========================================
  Files          36       37       +1     
  Lines        4938     5166     +228     
==========================================
+ Hits         4522     4692     +170     
- Misses        416      474      +58     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 100.00% <100.00%> (ø)
src/spatialdata/_core/spatialdata.py 93.96% <100.00%> (+0.07%) :arrow_up:
src/spatialdata/_core/operations/apply.py 73.63% <73.63%> (ø)

codecov[bot] avatar Nov 24 '23 15:11 codecov[bot]

use cases:

  • filters images -> images
  • apply segmentation algorithms to chunks of raster images images -> labels
  • region props/featurization labels, images -> table
  • apply labels layers to labels layers (merging, expanding masks) labels -> labels

giovp avatar Jun 11 '24 12:06 giovp

We discussed the PR in person today. Here is a simplified signature that would not sacrifice use cases but would simplify the implementation.

def process_raster_blockwise(
    data: SpatialImage | MultiscaleSpatialImage,
    func: Callable[..., NDArray | Array] | Mapping[str, Any],
    fn_kwargs: Mapping[str, Any] = MappingProxyType({}),
    combine_c: bool = True,  # TODO: code review
    combine_z: bool = True,  # TODO: code review
    chunks: Optional[str | int | Tuple[int, ...]] = None,  # TODO: explain
    output_chunks: Optional[Tuple[Tuple[int, ...], ...]] = None,
    depth: int | tuple[int, ...] | dict[int, int] = 0,  # specify the overlap between tiles; if 0, map_blocks, if != 0, map_overlap
    **kwargs: Any,
) -> SpatialImage | MultiscaleSpatialImage:
    pass

Comments:

  • [x] it could be that some of combine_c, combine_z and chunks can be removed. We could check this after a refactoring or during code review.
  • [x] The new parameter depth should be documented, explaining how it changes the internal behaviors (map_blocks vs map_overlap).

A joint code review from @giovp and me will follow next week.

LucaMarconato avatar Jun 11 '24 13:06 LucaMarconato

Following our discussion, I created a new draft PR that removes much of the complexity introduced in this PR, see https://github.com/scverse/spatialdata/pull/588

ArneDefauw avatar Jun 17 '24 12:06 ArneDefauw

Thanks for the new PR @ArneDefauw, I find the new code clean and well structured. I am currently reviewing it, I will upload the review later. For the moment I am closing this PR as the new PR will replace this.

LucaMarconato avatar Jun 18 '24 11:06 LucaMarconato