Large spatial transcriptomics datasets best practices

Open KalinNonchev opened this issue 7 months ago • 0 comments

Hi spatialdata team,

Thank you for your work on this important project!

We are currently generating multiple terabytes of multi-modal digital spatial transcriptomics data, including H&E images, gene expression, spatial coordinates, and various annotations. You can explore the initial 8 TB of data on HuggingFace, which contains over 56 million spots across 3,780 samples. Currently, each sample is stored as a separate compressed .h5ad.gz annata object.

We’d appreciate any guidance on best practices for storing and managing this scale of data using SpatialData. Have you benchmarked SpatialData for similarly large spatial transcriptomics datasets? Additionally, could you share what benefits we might expect from migrating our data to this format?

Thanks!

May 12 '25 07:05 KalinNonchev