spatialdata-io Change points to shapes for stereo-seq

Hi, We are using Stereo-seq a lot. It's more common to operate on the bin but not the cell_circles for most of the stereoseq users, because the cell segmentation performed badly as a result of both the experiment technology and algorithm limitations, could you change the Points to Shapes by default ?

Originally posted by @wangjiawen2013 in #97

Jan 03 '25 15:01 LucaMarconato

@wangjiawen2013 thanks for the feedback! The change you suggest is quite straightforward (basically it amounts to replace this code here https://github.com/scverse/spatialdata-io/blob/92dbde14913531a9d1f43c8cff1a2ccfb077effd/src/spatialdata_io/readers/stereoseq.py#L279 with this one https://github.com/scverse/spatialdata-io/blob/92dbde14913531a9d1f43c8cff1a2ccfb077effd/src/spatialdata_io/readers/visium_hd.py#L252). The problem is that currently we do not support lazy loading for shapes as we do for points, and this would affect performance. There could be a new function argument to allow the parsing of bins as shapes, while keeping the default as points.

We have limited bandwidth as we are working on some parts of the core library at the moment and won't be able to work on this specific task soon, but if you would like to try making a PR, we will be happy to review it!

Jan 03 '25 15:01 LucaMarconato

Also, please check out this feature https://github.com/scverse/spatialdata/pull/578, and this new one (merged yesterday) https://github.com/scverse/spatialdata/pull/811, as you could find them useful for performant handling of Stereo-seq data.

Jan 03 '25 15:01 LucaMarconato

I tested and find that rasterize_bins is not a good idea for two reasons:

There are no col_key and row_key in stereoseq object, which corresponds to array_col and array_row in visium object. The x and y columns in stereoseq points element have different meanings with array_col and array_row in visium, they are not equivalence.
rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.

I'll define a new function argument to allow the parsing of bins as shapes, while keeping the default as points. As only bin 10-200 are used for most stereoseq users, the preformance should not be affected seriously by setting the appropriate bin size.

Jan 06 '25 09:01 wangjiawen2013

Thanks for sharing. If performance is not an issue, I'd indeed proceed as you described. Please notice that the approach shown here https://github.com/scverse/spatialdata/pull/811 (i.e. calling rasterize(return_region_as_labels=True) would not be affected by the challenges you mentioned.

Finally a comment on the challenges you faced.

https://github.com/scverse/spatialdata/pull/578#issuecomment-2167975364

Not having the row/col ready is definitely inconvenient, but these could be reconstructed. If performance starts being an issue in case you need to show the smallest bins you could consider this.

rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.

True, rasterize_bins() currently doesn't support these cases as the focus when we developed was on being used with a sparse matrix. Nevertheless these are important cases so we will take this feedback into account. A quick workaround for now is to put the obs into a new .X in a new table, or obs.cat.codes in case of a categorical column. Btw, we should add support for adata.layers too here (we recently added it for get_values() https://github.com/scverse/spatialdata/pull/818 after your feedback in spatialdata-plot https://github.com/scverse/spatialdata-plot/issues/326).

Jan 06 '25 10:01 LucaMarconato

Glad to hear that we can use layers !

We have been cooperating with BGI (the inventor of stereoseq) for many years. bin50-bin200 are used frequently for production purpose, and bin1 is rarely used (it is used occasionally for testing purpose). The memory and running time are acceptable for bin 50-bin200 (or even bin10) when using scanpy/squidpy/seurat. So don't worry a lot about the performance. When we want to use bin1, we can use rasterize as you said. But at current stage, the convinence and functionality are more important than the performance.

The following is the envisaged reader function signature:

def stereoseq(
    path: str | Path,
    dataset_id: str | None = None,
    read_square_bin: bool = True,
    bin_to_shape:  list[int] | int | None = None,
    optional_tif: bool = False,
    imread_kwargs: Mapping[str, Any] = MappingProxyType({}),
    image_models_kwargs: Mapping[str, Any] = MappingProxyType({}),
) -> SpatialData

When bin_to_shape is None, bin_size>=10 will be treated as shapes. when it is a int or list[int] (such as 20 or [20, 50, 100]), bin_size 20, 50, 100 will be treated as shapes. As you're more familiar with spatialdata, hope you can modify the reader. Though you have indicated the code needed to be changed, I find it's hard for me to understand it. I am still an ordinary user and still have a long way to be a developer.

Jan 07 '25 05:01 wangjiawen2013

Thank you for sharing your experience and proposed signature. Another question, which version of StereoSeq data are you currently using? The reader was designed for the version 7.0.0.

Jan 14 '25 00:01 LucaMarconato

We're using 7.0.0. We know it is updated to version8, but we're hesitating whether to use version8.

Jan 14 '25 00:01 wangjiawen2013

We're using 7.0.0. We know it is updated to version8, but we're hesitating whether to use version8.

Jan 14 '25 00:01 wangjiawen2013

Stereo-seq FF V1.3 or Stereo-seq FFPE can only be analyzed with version8, so I recommend version8.

I like to use shape, but it's worth noting the x and y columns in stereoseq is upper left corner coordintates ( tereopy/issues/366 )

Feb 08 '25 07:02 z-spider

Thanks @z-spider for joining the conversation. Maybe an approach worth exploring is to use https://github.com/STOmics/Stereopy internally in our steroseq() reader, so that we delegate the handling of the most up-to-date format to stereopy. Another could be to work towards having directly a convert to SpatialData objects in stereopy.

I won't have the bandwidth to work on this right now, but if someone is interested in exploring the above, please reach out!

Feb 08 '25 15:02 LucaMarconato

No, stereopy is not a good idea. We have used stereopy before. It's hard to install and hard to use. Just ignore it.

Feb 09 '25 11:02 wangjiawen2013

No, stereopy is not a good idea. We have used stereopy before. It's hard to install and hard to use. Just ignore it.

I agree，stereopy now only support python=3.8

Feb 10 '25 02:02 z-spider

spatialdata-io spatialdata-io copied to clipboard

Change points to shapes for stereo-seq

spatialdata-io
spatialdata-io copied to clipboard