Apply function
Implementation of an apply function for image layers in spatialdata object.
Basic usage is similar to squidpy’s apply function,
E.g.:
def _multiply(image, parameter=2):
return image * parameter
fn_kwargs = {"parameter": 4}
sdata_blobs = sdata_blobs.apply(
func=_multiply,
fn_kwargs=fn_kwargs,
img_layer="blobs_image",
output_layer="blobs_apply",
)
There is the option to process different channels and/or z stacks with different callable/parameters, e.g. via fn_kwargs=( 0: ”parameter”: 4, 1: “parameter”: 5 ), then channels 0 and 1 will be processed with different parameters. If channels/z-stacks should be processed with different parameters, we require setting combine_c/combine_z to False.
On the other hand, setting specific callable/parameters for different channels, e.g. fn_kwargs=( 0: ”parameter”: 4, 1: “parameter”: 5 ) is not allowed when setting combine_c to True, same for z-stacks.
I included this combine_c/combine_z flag in order to be clear about what is the required input/output dimension for the Callable.
E.g. for 2D,
If combine_c==True —> input is c,y,x, else it is 1,y,x.
(I am still not sure if it is clearer for the user if we squeeze the c dimension in the latter case)
To the best of my knowledge, in squidpy it was not possible to specify different callable/parameter for different channels, although a single channel could be specified.
In the docstring I included some more examples, I also refer to the unit tests.
Another parameter that was added to the apply function is the output_chunks parameter. This parameter is passed as chunks to the map_overlap/map_blocks (if chunksparameter passed to apply is not None). This allows the user to alter the output shape of the arrays (z,y,x). z dimension can only be altered if combine_z is set to True, which makes sense to me.
Channel dimension can not be changed at the moment. I am unsure if we should support this.
For multiscale images, there is still an issue. Although the apply function works in this case, it will redo the calculation for each scale. I think this recalculation is triggered by
spatialdata.models.Image2DModel.parse(
arr,
dims=se.dims,
scale_factors=scale_factors,
chunks=arr.chunksize,
c_coords=channel,
)
For spatialdata objects that are not backed by a zarr store, a solution is to do arr.persist() before adding it as an image layer to the spatialdata object. For backed sdata, a solution is to do the calculation on scale0, write to an intermediate image layer, an then add it as multiscale. But I am unsure If we should try to fix this in the apply function.
Currently only image layer to image layer is supported.
I think image-layer -> labels-layer is basically segmentation. And this requires some more complex steps, such as a bit shift such that labels in each chunks are unique + clever glueing the labels from each chunk back together - a simple map_overlap is not sufficient.
Labels-layer -> labels-layer is basically the same story. Whatever non-trivial processing you do, you need more than a map_overlap/map_blocks. But maybe I have overlooked something, and there is a use case here.
I have added some unit tests. However, I did not test the case where there is a z-dimension, because I could not find an sdata with c,z,y,x, image layers in the pytest fixtures.
Codecov Report
Attention: Patch coverage is 74.67249% with 58 lines in your changes missing coverage. Please review.
Project coverage is 90.82%. Comparing base (
bf3377f) to head (4df226c). Report is 171 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/spatialdata/_core/operations/apply.py | 73.63% | 58 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #407 +/- ##
==========================================
- Coverage 91.57% 90.82% -0.76%
==========================================
Files 36 37 +1
Lines 4938 5166 +228
==========================================
+ Hits 4522 4692 +170
- Misses 416 474 +58
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/spatialdata/__init__.py | 100.00% <100.00%> (ø) |
|
| src/spatialdata/_core/spatialdata.py | 93.96% <100.00%> (+0.07%) |
:arrow_up: |
| src/spatialdata/_core/operations/apply.py | 73.63% <73.63%> (ø) |
use cases:
- filters images -> images
- apply segmentation algorithms to chunks of raster images images -> labels
- region props/featurization labels, images -> table
- apply labels layers to labels layers (merging, expanding masks) labels -> labels
We discussed the PR in person today. Here is a simplified signature that would not sacrifice use cases but would simplify the implementation.
def process_raster_blockwise(
data: SpatialImage | MultiscaleSpatialImage,
func: Callable[..., NDArray | Array] | Mapping[str, Any],
fn_kwargs: Mapping[str, Any] = MappingProxyType({}),
combine_c: bool = True, # TODO: code review
combine_z: bool = True, # TODO: code review
chunks: Optional[str | int | Tuple[int, ...]] = None, # TODO: explain
output_chunks: Optional[Tuple[Tuple[int, ...], ...]] = None,
depth: int | tuple[int, ...] | dict[int, int] = 0, # specify the overlap between tiles; if 0, map_blocks, if != 0, map_overlap
**kwargs: Any,
) -> SpatialImage | MultiscaleSpatialImage:
pass
Comments:
- [x] it could be that some of
combine_c,combine_zandchunkscan be removed. We could check this after a refactoring or during code review. - [x] The new parameter depth should be documented, explaining how it changes the internal behaviors (
map_blocksvsmap_overlap).
A joint code review from @giovp and me will follow next week.
Following our discussion, I created a new draft PR that removes much of the complexity introduced in this PR, see https://github.com/scverse/spatialdata/pull/588
Thanks for the new PR @ArneDefauw, I find the new code clean and well structured. I am currently reviewing it, I will upload the review later. For the moment I am closing this PR as the new PR will replace this.