proseg v3.0.10 VisiumHD error reading the spaceranger output
Hi,
When using proseg for spaceranger I encountered something I cannot explain: When reading in the visiumHD data, proseg always displays more than twice as much genes as they are in the dataset:
(proseg) roman@roman-System-Product-Name:/media/roman/data/Spaceranger$ proseg --visiumhd --spaceranger-barcode-mappings /media/roman/data/Spaceranger/spaceranger_LM_exp1/VisiumHD_LM_exp1_sample1_cropped_4um/outs/barcode_mappings.parquet /media/roman/data/Spaceranger/spaceranger_LM_exp1/VisiumHD_LM_exp1_sample1_cropped_4um/outs/binned_outputs --voxel-layers 2 Using 64 threads Finished reading zarr Read dataset: 474079383 transcripts 294623 cells 37082 genes 0 fovs
When I look at the spaceranger websummary, the dataset contains 18,016 detected genes (which makes sense, bc its visiumHD probe-based and this is round-about the number of probes), however proseg somehow seems to think there are 37082 genes? I have not modified the spaceranger output before directing proseg to it.
Thanks,
Lauritz
This is expected. Since visium data is encoded in a sparse matrix, not a transcript table, it includes every gene in the gene panel or index, even if if that gene wasn't detected. Not all of these 37k genes are necessarily detected in the data.
With that said, I should probably filter out unexpressed genes for performance reasons.