spatialdata-io
spatialdata-io copied to clipboard
Change points to shapes for stereo-seq
Hi, We are using Stereo-seq a lot. It's more common to operate on the bin but not the cell_circles for most of the stereoseq users, because the cell segmentation performed badly as a result of both the experiment technology and algorithm limitations, could you change the
PointstoShapesby default ?
Originally posted by @wangjiawen2013 in #97
@wangjiawen2013 thanks for the feedback! The change you suggest is quite straightforward (basically it amounts to replace this code here https://github.com/scverse/spatialdata-io/blob/92dbde14913531a9d1f43c8cff1a2ccfb077effd/src/spatialdata_io/readers/stereoseq.py#L279 with this one https://github.com/scverse/spatialdata-io/blob/92dbde14913531a9d1f43c8cff1a2ccfb077effd/src/spatialdata_io/readers/visium_hd.py#L252). The problem is that currently we do not support lazy loading for shapes as we do for points, and this would affect performance. There could be a new function argument to allow the parsing of bins as shapes, while keeping the default as points.
We have limited bandwidth as we are working on some parts of the core library at the moment and won't be able to work on this specific task soon, but if you would like to try making a PR, we will be happy to review it!
Also, please check out this feature https://github.com/scverse/spatialdata/pull/578, and this new one (merged yesterday) https://github.com/scverse/spatialdata/pull/811, as you could find them useful for performant handling of Stereo-seq data.
I tested and find that rasterize_bins is not a good idea for two reasons:
- There are no
col_keyandrow_keyin stereoseq object, which corresponds toarray_colandarray_rowin visium object. Thexandycolumns in stereoseq points element have different meanings witharray_colandarray_rowin visium, they are not equivalence. rasterize_bincannot mergetable.obsinto an image (especially for discrete annotations), but we want render bothtable.obsand gene expression.
I'll define a new function argument to allow the parsing of bins as shapes, while keeping the default as points. As only bin 10-200 are used for most stereoseq users, the preformance should not be affected seriously by setting the appropriate bin size.
Thanks for sharing. If performance is not an issue, I'd indeed proceed as you described. Please notice that the approach shown here https://github.com/scverse/spatialdata/pull/811 (i.e. calling rasterize(return_region_as_labels=True) would not be affected by the challenges you mentioned.
Finally a comment on the challenges you faced.
https://github.com/scverse/spatialdata/pull/578#issuecomment-2167975364
Not having the row/col ready is definitely inconvenient, but these could be reconstructed. If performance starts being an issue in case you need to show the smallest bins you could consider this.
rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.
True, rasterize_bins() currently doesn't support these cases as the focus when we developed was on being used with a sparse matrix. Nevertheless these are important cases so we will take this feedback into account. A quick workaround for now is to put the obs into a new .X in a new table, or obs.cat.codes in case of a categorical column. Btw, we should add support for adata.layers too here (we recently added it for get_values() https://github.com/scverse/spatialdata/pull/818 after your feedback in spatialdata-plot https://github.com/scverse/spatialdata-plot/issues/326).
Glad to hear that we can use layers !
We have been cooperating with BGI (the inventor of stereoseq) for many years. bin50-bin200 are used frequently for production purpose, and bin1 is rarely used (it is used occasionally for testing purpose). The memory and running time are acceptable for bin 50-bin200 (or even bin10) when using scanpy/squidpy/seurat. So don't worry a lot about the performance. When we want to use bin1, we can use rasterize as you said. But at current stage, the convinence and functionality are more important than the performance.
The following is the envisaged reader function signature:
def stereoseq(
path: str | Path,
dataset_id: str | None = None,
read_square_bin: bool = True,
bin_to_shape: list[int] | int | None = None,
optional_tif: bool = False,
imread_kwargs: Mapping[str, Any] = MappingProxyType({}),
image_models_kwargs: Mapping[str, Any] = MappingProxyType({}),
) -> SpatialData
When bin_to_shape is None, bin_size>=10 will be treated as shapes. when it is a int or list[int] (such as 20 or [20, 50, 100]), bin_size 20, 50, 100 will be treated as shapes.
As you're more familiar with spatialdata, hope you can modify the reader. Though you have indicated the code needed to be changed, I find it's hard for me to understand it. I am still an ordinary user and still have a long way to be a developer.
Thank you for sharing your experience and proposed signature. Another question, which version of StereoSeq data are you currently using? The reader was designed for the version 7.0.0.
We're using 7.0.0. We know it is updated to version8, but we're hesitating whether to use version8.
We're using 7.0.0. We know it is updated to version8, but we're hesitating whether to use version8.
Stereo-seq FF V1.3 or Stereo-seq FFPE can only be analyzed with version8, so I recommend version8.
I like to use shape, but it's worth noting the x and y columns in stereoseq is upper left corner coordintates ( tereopy/issues/366 )
Thanks @z-spider for joining the conversation. Maybe an approach worth exploring is to use https://github.com/STOmics/Stereopy internally in our steroseq() reader, so that we delegate the handling of the most up-to-date format to stereopy. Another could be to work towards having directly a convert to SpatialData objects in stereopy.
I won't have the bandwidth to work on this right now, but if someone is interested in exploring the above, please reach out!
No, stereopy is not a good idea. We have used stereopy before. It's hard to install and hard to use. Just ignore it.
No,
stereopyis not a good idea. We have usedstereopybefore. It's hard to install and hard to use. Just ignore it.
I agree,stereopy now only support python=3.8