spatialdata-io icon indicating copy to clipboard operation
spatialdata-io copied to clipboard

Add reader for Stereo-seq files.

Open LLehner opened this issue 2 years ago • 13 comments

Add reader for Stereo-seq files.

TODO:

  • [x] Read cellbin.gef completely
  • [x] Fix LabelsModel
  • [x] Fix ShapesModel
  • [x] counts_per_cell present in .obs?
  • [x] Update table.obsm["cellBorder"]
  • [x] Automatically retrieve dataset identifier

LLehner avatar Jul 18 '23 17:07 LLehner

file format description @LucaMarconato

https://github.com/STOmics/SAW/tree/main/Documents/FileFormat

LLehner avatar Jul 31 '23 14:07 LLehner

Codecov Report

Attention: Patch coverage is 45.51282% with 85 lines in your changes are missing coverage. Please review.

Project coverage is 36.98%. Comparing base (755d475) to head (10c267a). Report is 199 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
- Coverage   41.92%   36.98%   -4.94%     
==========================================
  Files          16       17       +1     
  Lines         854     1352     +498     
==========================================
+ Hits          358      500     +142     
- Misses        496      852     +356     
Files Coverage Δ
src/spatialdata_io/__init__.py 100.00% <100.00%> (ø)
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)
src/spatialdata_io/readers/stereoseq.py 16.66% <16.66%> (ø)

... and 9 files with indirect coverage changes

codecov-commenter avatar Aug 03 '23 13:08 codecov-commenter

image Fails for me, do you know why?

timtreis avatar Sep 05 '23 11:09 timtreis

how's this looking @LLehner ?

giovp avatar Feb 05 '24 17:02 giovp

Closes https://github.com/scverse/spatialdata-io/issues/97

LucaMarconato avatar Feb 15 '24 16:02 LucaMarconato

@LLehner @LucaMarconato how is this looking? should we merge?

giovp avatar Apr 24 '24 08:04 giovp

@giovp tasks 3. and 4. from issue#97 still need to be fixed.

LLehner avatar Apr 24 '24 20:04 LLehner

fantastic, what is the blocker?

giovp avatar Apr 25 '24 08:04 giovp

fantastic, what is the blocker?

aggregation of e.g. image channels over segmentation masks doesn't work yet, perhaps the segmentation mask isn't properly linked to the table. Also plotting with rendering shapes doesn't work yet.

LLehner avatar Apr 28 '24 21:04 LLehner

Is the plotting an issue in spatialdata-plot ? Can you open an issue and tag me, referencing this convo?

timtreis avatar May 06 '24 09:05 timtreis

I looked into the points 3 and 4 from https://github.com/scverse/spatialdata-io/issues/97.

  • [x] Point 3: the labels object has integer values in {0, 1}, which the background being 0 and each cell having label index 1. This makes the aggregation not computable. Possible solutions are to introduce an arbitrary labeling by identifying the connected components, or to look if the labeling already exists in the raw data. @florianingelfinger do you know if the raw data contains such information or should we proceed with a arbitrary labeling? Also, Florian mentioned that some data is available in obsm, I will look into it to see if it can be helpful for this. Edit: I am now parsing the polygonal data from obsm. I am not converting the labels (explained in a comment below).

  • [x] Point 4: there was a bug with the instance_key column getting set to None, I fixed it in ac34e78 (#70). This still doesn't fix point 4 but now I think it should be easy, probably just a string mismatch. I will look into it.

In addition to solving the above, I would also like to address the following points:

  • [x] now that we support representing multiple annotations tables, I would create a table for each bin size (currently we just parse the table for the cell-level data).
  • [x] currently in napari we tacitly subsample a points layer if it contains more than 100000 points. I will add a warning ~~icon next to the label~~ so that the user is warned that this happens (otherwise plotting the bin sizes of 1 leads to a unintuitive visualization). The warnings will tell how to remove this limit.

Finally, we will be working on a rasterization-based approach for rendering large collections of bins. This will come after this PR is merged, but when available, will improve the user experience around Stereo-seq data.

LucaMarconato avatar May 10 '24 14:05 LucaMarconato

Many thanks for your work! To my knowledge there is no cell identifier associated with each cell in the raw data or at least we have not used one so far. I would proceed as suggested with arbitrary labeling!

florianingelfinger avatar May 13 '24 06:05 florianingelfinger

I fixed all the points above, with the exception to the parsing of the labels, which I now parse as an image with two colors instead of as a labels, to avoid confusion. I have tried using scikit-image to relabel the labels image but the number of labels that I obtain and the number of cells is slightly different. This is easily fixable but I will rather skip doing this proprocessing within the reader and let the user choose to perform this if needed.

I will polish the code and make a short example notebook, after this we are good to merge.

LucaMarconato avatar May 22 '24 12:05 LucaMarconato

I prepared and uploaded the notebook here; I removed the outputs because the data is not currently public.

The notebook is affected by two bugs of spatialdata-plot, that I tracked here and here. The visualization work with napari-spatialdata. @timtreis since you have the data locally, could you please have a look at them?

Anyway, since the bugs are not in spatialdata-io, now the PR is ready to merge! Thanks all for the work! 🚀

LucaMarconato avatar May 24 '24 14:05 LucaMarconato