spatialdata Aggregate points by shapes lazily

Issue to keep track of this particular aggregation case.

Currently when aggregating points by shapes, all the points are loaded into memory. Ideally this would be done incrementally by keeping the data lazy.

When implemented, remove the docstring in aggregate() that explains the current performance limitation.

Apr 05 '23 12:04 LucaMarconato

One idea would be to let the user decide to load the points in-memory, and if this is done, the points would stay loaded in-memory. This is a separate topic that was already mention (it would require a change in the schema) and it will be discussed in the next hackathon.

Apr 05 '23 14:04 LucaMarconato

Does this aggregation case include when we try to do something like the following? sdata.aggregate(values='morphology_focus', by = 'cell_labels', agg_func='mean')

where sdata looks like:

SpatialData object
├── Images
│     └── 'morphology_focus': DataTree[cyx] (1, 20480, 22787), (1, 10240, 11393), (1, 5120, 5696), (1, 2560, 2848), (1, 1280, 1424)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (20480, 22787), (10240, 11393), (5120, 5696), (2560, 2848), (1280, 1424)
│     └── 'nucleus_labels': DataTree[yx] (20480, 22787), (10240, 11393), (5120, 5696), (2560, 2848), (1280, 1424)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 11) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (85596, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (85596, 2) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (85596, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (85596, 310)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)

Basically aggregating the pixel values of an image by the cell boundaries.

Thanks a lot! Regards, Shashank

May 19 '25 19:05 shashkat