spatialdata-io Stereoseq expected directory structure

Hi Team,

Would it be possible to document the expected directory structure for the stereoseq reader? The results we got from the stereoseq team don't follow the directory structure being assumed by this implementation. So, if it's properly documented, like the folder names to expect and which files should be put in which folder, then we can manually restructure our directories to conform to this stereoseq reader implementation.

As an example, this is the results folder we receive from the Stereoseq team:

Thanks a bunch.

Jun 18 '24 17:06 aadimator

Thanks @aadimator for reporting this. @LLehner could you please have a look into this?

[ ] Precisely, please I would add a line in the docstring of the stereoseq function specifying which stereoseq data version we expected and adding a link to the technical document from the STOmics website that specifies the file structure.

@aadimator which version of the data is the screenshot referring to?

Jul 10 '24 18:07 LucaMarconato

This comment: https://github.com/scverse/spatialdata-io/pull/70#issuecomment-1658529103 is from July 31st, 2023, therefore I believe that the reader is designed for the format 7.0.0: https://github.com/STOmics/SAW/tree/0808e44619f84b67d44c063b2fd24762f6633051/Documents/FileFormat and that the latest 7.1.1 (or even 7.1.0) is not supported.

Jul 10 '24 18:07 LucaMarconato

Thanks @aadimator for reporting this. @LLehner could you please have a look into this?

[ ] Precisely, please I would add a line in the docstring of the stereoseq function specifying which stereoseq data version we expected and adding a link to the technical document from the STOmics website that specifies the file structure.

@aadimator which version of the data is the screenshot referring to?

We got this as is from the Stereoseq Team, and I think it's not following any particular format. I think I'll have to manually rename/place the files into their relative/expected directories. I'll try to follow the SAW 7.0.0. format for now.

Aug 01 '24 14:08 aadimator

saw 8.0 has new output directory structure 1723010972577

from this manual: https://en.stomics.tech/service/new-saw-operation-manual.html

Aug 07 '24 06:08 z-spider

Thank you for the comment. For the moment I will restrict or document that the reader operates only on 7.x. Unfortunately we don't have the bandwidth to support the latest version at the moment. But a community contribution is welcomed and we would be happy to review the code in such case.

Todo for us:

[ ] restrict or document that the stereoseq reader only works for 7.x data.

Aug 07 '24 12:08 LucaMarconato

Hi Luca, I tried https://github.com/brainfo/spatialdata-io/blob/main/src/spatialdata_io/readers/stereoseq.py

This works with the "folder structure" from SAW v8; for also a duplicate issue #322

Side note 1: datasets from stormics website are not with a folder structure but files all in one directory https://en.stomics.tech/col1357/index ; to test on output folder structure from SAW v8, we could create such structure and put the data from the website. For now I tested on in-house data, showed following replaced the real id to {sample_id}

Side note 2: In my practice, actually, for collaboration project, we only transfer necessary data, but not the entire folder structure where @z-spider copied. Unnecessary folder and files are: bam/; feature_expression/{sample_id}.raw.gef; feature_expression/.txt; analysis/.marker_features.csv; visualization.tar.gz*; {sample_id}.report.html); doesn't hurt to keep them, i.e., the outs/ folder intact with all files. Below is a minimal example:

outs
├── analysis *optional when load_analysis=False
│   ├── {sample_id}.bin20_1.0.h5ad 
│   └── {sample_id}.bin50_1.0.h5ad
├── feature_expression
│   └── {sample_id}.tissue.gef *required
└── image *optional
    ├── {sample_id}_HE_regist.tif
    └── {sample_id}_HE_tissue_cut.tif

4 directories, 5 files

from spatialdata_io import stereoseq_v8
sdata = stereoseq_v8('path/to/outs') # outs/ in the saw8 output directory

then I got an sdata

SpatialData object
├── Images
│     ├── '{sample_id}_HE_regist': DataTree[cyx] (3, 23520, 23520), (3, 11760, 11760), (3, 5880, 5880), (3, 2940, 2940), (3, 1470, 1470)
│     └── '{sample_id}_HE_tissue_cut': DataTree[cyx] (1, 23520, 23520), (1, 11760, 11760), (1, 5880, 5880), (1, 2940, 2940), (1, 1470, 1470)
├── Points
│     ├── 'analysis_bin20_points': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'analysis_bin50_points': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin1_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin5_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin10_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin20_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin50_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin100_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     ├── 'bin150_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
│     └── 'bin200_genes': DataFrame with shape: (<Delayed>, 2) (2D points)
└── Tables
      ├── 'analysis_bin20': AnnData (199772, 28999)
      ├── 'analysis_bin50': AnnData (33366, 28999)
      ├── 'bin1_table': AnnData (2636982, 28999)
      ├── 'bin5_table': AnnData (1496312, 28999)
      ├── 'bin10_table': AnnData (652510, 28999)
      ├── 'bin20_table': AnnData (199772, 28999)
      ├── 'bin50_table': AnnData (33366, 28999)
      ├── 'bin100_table': AnnData (8575, 28999)
      ├── 'bin150_table': AnnData (3888, 28999)
      └── 'bin200_table': AnnData (2241, 28999)
with coordinate systems:
    ▸ 'global', with elements:
        {sample_id}_HE_regist (Images), {sample_id}_HE_tissue_cut (Images), analysis_bin20_points (Points), analysis_bin50_points (Points), bin1_genes (Points), bin5_genes (Points), bin10_genes (Points), bin20_genes (Points), bin50_genes (Points), bin100_genes (Points), bin150_genes (Points), bin200_genes (Points)

Oct 14 '25 21:10 brainfo

Hi, great to hear that you found a workaround. Thanks for sharing!

Oct 15 '25 22:10 LucaMarconato

spatialdata-io spatialdata-io copied to clipboard

Stereoseq expected directory structure

spatialdata-io
spatialdata-io copied to clipboard