cellsnp-lite
cellsnp-lite copied to clipboard
cellsnp-lite output not working with Vireo - reporting really high read depths
I have a scATACseq dataset where I'm trying to use the mitochondrial reads to demultiplex two mouse strains. When I run cellSNP and cellsnp-lite in mode 2 using what I believe to be matching parameters and the same input files, both identify all of the same SNPs, but the read depths reported in cellSNP.base.vcf.gz and the sparse matrices are different; the depths reported by cellsnp-lite are ~100X greater. When I run vireo on these two different outputs, the cellSNP results seem to properly demultiplex my donors, whereas the cellsnp-lite results leave all cells unassigned. Importantly, I noticed in the donor_ids.tsv file for the cellsnp-lite vireo run, that all cells are reported to have the same results:
cell donor_id prob_max prob_doublet n_vars best_singlet best_doublet doublet_logLikRatio
AAACGAAAGGTCGTTT-1 unassigned 4.76e-01 4.79e-02 370 donor0 donor0,donor1 0.000
I guess I'm not sure if this is a bug with cellsnp-lite or if I did something drastically different running the two programs without realizing it. Any ideas?
Here are the two batches of code I ran and the relevant console outputs:
cellsnp-lite
cellsnp-lite \
-s ./chrM_CB.bam \
-b ./barcodes_mt_revcomp.tsv \
-O ./cellsnp-lite_DeNovo \
-p 40 \
--minMAF 0.1 \
--UMItag None \
--chrom M
--exclFLAG UNMAP,SECONDARY,QCFAIL
[I::main] start time: 2021-11-24 11:49:27
[W::check_args] Max depth set to maximum value (2147483647)
[I::main] mode 2a: pileup 1 whole chromosomes in 4787 single cells.
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-0] processing chrom M ...
[I::csp_pileup_core][Thread-0] has pileup-ed in total 370 SNPs for chrom M
[I::main] All Done!
[I::main] end time: 2021-11-24 13:12:30
[I::main] time spent: 4983 seconds.
less cellsnp-lite_DeNovo/cellSNP.base.vcf.gz | head
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
M 55 . T G . PASS AD=146211;DP=489229;OTH=247
M 165 . C A . PASS AD=142350;DP=607292;OTH=268
M 348 . T C . PASS AD=103804;DP=350149;OTH=251
vireo \
-c ./cellsnp-lite_DeNovo/ \
-N 2 \
-o ./vireo/cellsnp-lite_DeNovo \
-p 40
[vireo] Loading cell folder ...
[vireo] Demultiplex 4787 cells to 2 donors with 370 variants.
[vireo] lower bound ranges [-114928564.4, -114928564.4, -114928564.4]
[vireo] allelic rate mean and concentrations:
[[0.01 0.284 0.99 ]]
[[5.00000000e+01 2.39958576e+08 5.00000000e+01]]
[vireo] donor size before removing doublets:
donor0 donor1
2394 2394
[vireo] final donor size:
unassigned
4787
[vireo] All done: 0 min 16.0 sec
cellSNP
cellSNP \
-s ./chrM_CB.bam \
-b ./barcodes_mt_revcomp.tsv \
-O ./cellSNP_DeNovo \
-p 40 \
--minMAF=0.1 \
--UMItag=None \
--chrom=M \
--maxFLAG=4096
[cellSNP] mode 2: pileup 1 whole chromosomes in 4787 single cells.
[cellSNP] Whole genome pileupped, now merging all variants ...
[cellSNP] 408 lines in final vcf file
[cellSNP] All done: 64 min 43.5 sec
less cellSNP_DeNovo/cellSNP.base.vcf.gz | head
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
chrM 55 . T G . PASS AD=1924;DP=7534;OTH=2
chrM 165 . C A . PASS AD=2100;DP=7740;OTH=0
chrM 348 . T C . PASS AD=2749;DP=7402;OTH=2
vireo \
-c ./cellSNP_DeNovo/cellSNP.cells.vcf.gz \
-N 2 \
-o ./vireo/cellSNP_DeNovo \
-p 40
[vireo] Loading cell VCF file ...
[vireo] Demultiplex 4787 cells to 2 donors with 370 variants.
[vireo] lower bound ranges [-1585667.0, -1585667.0, -430338.7]
[vireo] allelic rate mean and concentrations:
[[0.038 0.907 0.938]]
[[1857003.8 450325.5 418458.7]]
[vireo] donor size before removing doublets:
donor0 donor1
1735 3052
[vireo] final donor size:
donor0 donor1 doublet unassigned
1581 2891 286 29
[vireo] All done: 0 min 15.2 sec
Hi, thanks for your feedback. Could you share the version of cellSNP and cellsnp-lite?
The reason why the read counts are different: cellSNP (actually the dependent pysam) has a default limitation that the max_depth (i.e., max pileup-ed read count) is 8000, so you may check that every DP<=8000 in the cellSNP output. But cellsnp-lite does not have this max_depth limitation, it will pileup as many reads as possible. (we are trying to add a --maxPileup
option with similar function as max_depth, for the next release of cellsnp-lite)
The reason why cellsnp-lite+vireo does not work well: The large read counts in cellsnp-lite output makes it more likely for vireo to reach local optima so that the parameters of two donors become the same and hence vireo cannot assign the cells to certain donor.
Besides, vireo is designed for nuclear SNVs. For mito SNVs, you may want to try this tutorial, which was used by MQuad. Note that the duplicate reads should probably be removed beforehand, as there are not UMIs in your case.
Xianjie