segger_dev icon indicating copy to clipboard operation
segger_dev copied to clipboard

`spatialdata` support in `segger`

Open LucaMarconato opened this issue 1 year ago • 2 comments

spatialdata support in segger

Describing use cases and a possible strategy to enable spatialdata support.

Use cases

These use cases can be considered as incremental goals, to accomplish in this order:

  1. flexibility to work with processed Xenium and MERSCOPE data and not just raw data (having them stored as SpatialData Zarr object)
  2. extend segger to new transcripts-based data types (e.g. seqFISH)
  3. extend segger to bins-based data types (e.g. Visium HD, Stereo-seq, Open-ST)
  4. enable napari-spatialdata visualization when Xenium explorer is not available (=non-Xenium data)
  5. enable spatialdata-based tools like bento-tools and sopa.

Method

Numbers correspond to the above list and they depend on each other as follows: 1 -> 4 and 1 -> 2 -> 3 -> 5.

  1. add a new subclass to SpatialTranscriptomicsSample that accepts SpatialData objects and reproduce the Xenium + MERSCOPE support. The subclass will reimplement some methods of the base class but keep API compatible with the segger pipeline
  2. analogous as above for the STSampleParquet class
  3. test segger on a new transcripts-based technology
  4. test segger on Visium HD data; will require some modification of the Visium HD data to make it look like a transcript-based data. Doing it once will make it work for each bins-based tech thanks to the SpatialData abstraction.
  5. create a parser for produced results into a new SpaitalData object (or put the predictions into the original one)
  6. test napari-spatialdata on the newly created object.

detailed lists of tasks

segger.data

  • [ ] data/parquet/_settings/xenium.yaml: not sure about this file.
  • [ ] data/parquet/_utils.py: a quick way to enabled spatialdata support is to keep using these functions (even if they are mostly implemented in spatialdata) by making the class SpatialTranscriptomicsSample (or STSampleParquet?) return the right paths of .parquet files inside the SpatialData .zarr store. This means that in-memory SpatialData objects would not be supported, at least in the beginning. In this case, this file does not require modifications.
  • [ ] sample.py: similar comment to the above. I would create an helper function that takes a SpatialData object (stored on disk) and creates a STSampleParquet by passing the right paths. In this way STSampleParquet doesn't need to know that it comes from a SpatialData object. Some of the functions in sample.py are already available in the spatialdata package.
  • [ ] segger.data.__init__: include new classes and functions developed for the files above.
  • [ ] segger.data.contants no need to add SpatialData specific files
  • [ ] segger.data.io: we can subclass SpatialTranscriptomicsSample to support generic SpatialData objects so that we can reuse SpatialTranscriptomicsSample without having to adapt all the code. Note that some of the functions implemented SpatialTranscriptomicsSample are available in the spatialdata package.
  • [ ] README.md udpate to include info on the support of SpatialData

segger.cli

  • [ ] add a new .yaml configuration in segger.cli.configs.create_dataset based on a generic SpatialData .zarr file
  • [ ] create_dataset.py: add a new dataset_type spatialdata-zarr, which uses a new SpatialDataSample class

segger.models

  • no changes needed

segger.prediction

  • no changes needed

segger.training

  • no changed needed

segger.validation

  • [ ] add export function to a new SpatialData object (or make possible to extend the original one if the data came from a SpatialData object; it's easy to just create a new one)
  • [ ] test an example with napari-spatialdata

additional comments

  • [ ] for simplicity, one can start with data that does not require considering the coordinate transformation.

LucaMarconato avatar Oct 04 '24 10:10 LucaMarconato

@EliHei2 gonna meet and brainstorm on this next monday.

LucaMarconato avatar Oct 04 '24 10:10 LucaMarconato

Hi @EliHei2 @LucaMarconato, I saw you mentioned extending segger to bins data types like VisiumHD, which definitely sounds super interesting: how are you planning to do that? I guess something like decomposing a bin into multiple points?

I'm excited to incorporate it in Sopa when the SpatialData support will be finalized!

quentinblampey avatar Nov 20 '24 07:11 quentinblampey