`spatialdata` support in `segger`
spatialdata support in segger
Describing use cases and a possible strategy to enable spatialdata support.
Use cases
These use cases can be considered as incremental goals, to accomplish in this order:
- flexibility to work with processed Xenium and MERSCOPE data and not just raw data (having them stored as SpatialData Zarr object)
- extend
seggerto new transcripts-based data types (e.g. seqFISH) - extend
seggerto bins-based data types (e.g. Visium HD, Stereo-seq, Open-ST) - enable
napari-spatialdatavisualization when Xenium explorer is not available (=non-Xenium data) - enable
spatialdata-based tools likebento-toolsandsopa.
Method
Numbers correspond to the above list and they depend on each other as follows: 1 -> 4 and 1 -> 2 -> 3 -> 5.
- add a new subclass to
SpatialTranscriptomicsSamplethat acceptsSpatialDataobjects and reproduce theXenium+MERSCOPEsupport. The subclass will reimplement some methods of the base class but keep API compatible with theseggerpipeline - analogous as above for the
STSampleParquetclass - test segger on a new transcripts-based technology
- test segger on Visium HD data; will require some modification of the Visium HD data to make it look like a transcript-based data. Doing it once will make it work for each bins-based tech thanks to the
SpatialDataabstraction. - create a parser for produced results into a new
SpaitalDataobject (or put the predictions into the original one) - test
napari-spatialdataon the newly created object.
detailed lists of tasks
segger.data
- [ ]
data/parquet/_settings/xenium.yaml: not sure about this file. - [ ]
data/parquet/_utils.py: a quick way to enabledspatialdatasupport is to keep using these functions (even if they are mostly implemented inspatialdata) by making the classSpatialTranscriptomicsSample(orSTSampleParquet?) return the right paths of.parquetfiles inside the SpatialData.zarrstore. This means that in-memorySpatialDataobjects would not be supported, at least in the beginning. In this case, this file does not require modifications. - [ ]
sample.py: similar comment to the above. I would create an helper function that takes aSpatialDataobject (stored on disk) and creates aSTSampleParquetby passing the right paths. In this waySTSampleParquetdoesn't need to know that it comes from aSpatialDataobject. Some of the functions insample.pyare already available in thespatialdatapackage. - [ ]
segger.data.__init__: include new classes and functions developed for the files above. - [ ]
segger.data.contantsno need to addSpatialDataspecific files - [ ]
segger.data.io: we can subclassSpatialTranscriptomicsSampleto support genericSpatialDataobjects so that we can reuseSpatialTranscriptomicsSamplewithout having to adapt all the code. Note that some of the functions implementedSpatialTranscriptomicsSampleare available in thespatialdatapackage. - [ ]
README.mdudpate to include info on the support ofSpatialData
segger.cli
- [ ] add a new
.yamlconfiguration insegger.cli.configs.create_datasetbased on a generic SpatialData.zarrfile - [ ]
create_dataset.py: add a newdataset_typespatialdata-zarr, which uses a newSpatialDataSampleclass
segger.models
- no changes needed
segger.prediction
- no changes needed
segger.training
- no changed needed
segger.validation
- [ ] add export function to a new
SpatialDataobject (or make possible to extend the original one if the data came from aSpatialDataobject; it's easy to just create a new one) - [ ] test an example with
napari-spatialdata
additional comments
- [ ] for simplicity, one can start with data that does not require considering the coordinate transformation.
@EliHei2 gonna meet and brainstorm on this next monday.
Hi @EliHei2 @LucaMarconato, I saw you mentioned extending segger to bins data types like VisiumHD, which definitely sounds super interesting: how are you planning to do that? I guess something like decomposing a bin into multiple points?
I'm excited to incorporate it in Sopa when the SpatialData support will be finalized!