proseg icon indicating copy to clipboard operation
proseg copied to clipboard

proseg v3.0.10 VisiumHD error reading the spaceranger output

Open LauritzMia opened this issue 2 months ago • 1 comments

Hi,

When using proseg for spaceranger I encountered something I cannot explain: When reading in the visiumHD data, proseg always displays more than twice as much genes as they are in the dataset:

(proseg) roman@roman-System-Product-Name:/media/roman/data/Spaceranger$ proseg --visiumhd --spaceranger-barcode-mappings /media/roman/data/Spaceranger/spaceranger_LM_exp1/VisiumHD_LM_exp1_sample1_cropped_4um/outs/barcode_mappings.parquet /media/roman/data/Spaceranger/spaceranger_LM_exp1/VisiumHD_LM_exp1_sample1_cropped_4um/outs/binned_outputs --voxel-layers 2 Using 64 threads Finished reading zarr Read dataset: 474079383 transcripts 294623 cells 37082 genes 0 fovs

When I look at the spaceranger websummary, the dataset contains 18,016 detected genes (which makes sense, bc its visiumHD probe-based and this is round-about the number of probes), however proseg somehow seems to think there are 37082 genes? I have not modified the spaceranger output before directing proseg to it.

Thanks,

Lauritz

LauritzMia avatar Oct 20 '25 13:10 LauritzMia

This is expected. Since visium data is encoded in a sparse matrix, not a transcript table, it includes every gene in the gene panel or index, even if if that gene wasn't detected. Not all of these 37k genes are necessarily detected in the data.

With that said, I should probably filter out unexpressed genes for performance reasons.

dcjones avatar Oct 20 '25 15:10 dcjones