spatialdata-io
spatialdata-io copied to clipboard
Add reader for Stereo-seq files.
Add reader for Stereo-seq files.
TODO:
- [x] Read cellbin.gef completely
- [x] Fix LabelsModel
- [x] Fix ShapesModel
- [x]
counts_per_cellpresent in .obs? - [x] Update
table.obsm["cellBorder"] - [x] Automatically retrieve dataset identifier
file format description @LucaMarconato
https://github.com/STOmics/SAW/tree/main/Documents/FileFormat
Codecov Report
Attention: Patch coverage is 45.51282% with 85 lines in your changes are missing coverage. Please review.
Project coverage is 36.98%. Comparing base (
755d475) to head (10c267a). Report is 199 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #70 +/- ##
==========================================
- Coverage 41.92% 36.98% -4.94%
==========================================
Files 16 17 +1
Lines 854 1352 +498
==========================================
+ Hits 358 500 +142
- Misses 496 852 +356
| Files | Coverage Δ | |
|---|---|---|
| src/spatialdata_io/__init__.py | 100.00% <100.00%> (ø) |
|
| src/spatialdata_io/_constants/_constants.py | 100.00% <100.00%> (ø) |
|
| src/spatialdata_io/readers/stereoseq.py | 16.66% <16.66%> (ø) |
how's this looking @LLehner ?
Closes https://github.com/scverse/spatialdata-io/issues/97
@LLehner @LucaMarconato how is this looking? should we merge?
@giovp tasks 3. and 4. from issue#97 still need to be fixed.
fantastic, what is the blocker?
fantastic, what is the blocker?
aggregation of e.g. image channels over segmentation masks doesn't work yet, perhaps the segmentation mask isn't properly linked to the table. Also plotting with rendering shapes doesn't work yet.
Is the plotting an issue in spatialdata-plot ? Can you open an issue and tag me, referencing this convo?
I looked into the points 3 and 4 from https://github.com/scverse/spatialdata-io/issues/97.
-
[x] Point 3: the labels object has integer values in {0, 1}, which the background being 0 and each cell having label index 1. This makes the aggregation not computable. Possible solutions are to introduce an arbitrary labeling by identifying the connected components, or to look if the labeling already exists in the raw data. @florianingelfinger do you know if the raw data contains such information or should we proceed with a arbitrary labeling? Also, Florian mentioned that some data is available in obsm, I will look into it to see if it can be helpful for this. Edit: I am now parsing the polygonal data from obsm. I am not converting the labels (explained in a comment below).
-
[x] Point 4: there was a bug with the
instance_keycolumn getting set toNone, I fixed it inac34e78(#70). This still doesn't fix point 4 but now I think it should be easy, probably just a string mismatch. I will look into it.
In addition to solving the above, I would also like to address the following points:
- [x] now that we support representing multiple annotations tables, I would create a table for each bin size (currently we just parse the table for the cell-level data).
- [x] currently in napari we tacitly subsample a points layer if it contains more than 100000 points. I will add a warning ~~icon next to the label~~ so that the user is warned that this happens (otherwise plotting the bin sizes of 1 leads to a unintuitive visualization). The warnings will tell how to remove this limit.
Finally, we will be working on a rasterization-based approach for rendering large collections of bins. This will come after this PR is merged, but when available, will improve the user experience around Stereo-seq data.
Many thanks for your work! To my knowledge there is no cell identifier associated with each cell in the raw data or at least we have not used one so far. I would proceed as suggested with arbitrary labeling!
I fixed all the points above, with the exception to the parsing of the labels, which I now parse as an image with two colors instead of as a labels, to avoid confusion. I have tried using scikit-image to relabel the labels image but the number of labels that I obtain and the number of cells is slightly different. This is easily fixable but I will rather skip doing this proprocessing within the reader and let the user choose to perform this if needed.
I will polish the code and make a short example notebook, after this we are good to merge.
I prepared and uploaded the notebook here; I removed the outputs because the data is not currently public.
The notebook is affected by two bugs of spatialdata-plot, that I tracked here and here. The visualization work with napari-spatialdata. @timtreis since you have the data locally, could you please have a look at them?
Anyway, since the bugs are not in spatialdata-io, now the PR is ready to merge! Thanks all for the work! 🚀