souporcell
souporcell copied to clipboard
Recommendations for SNP Filtering
Hello. I have ran souporcell and it seems like a pretty robust software as far as I can tell. My concern is that I am ending up with over 1 million SNPs for a pool of 4 samples in my cluster_genotypes.vcf file. I have looked at this post and this post to get an idea on how to select the most reliable SNPs in the vcf file. I can reduce the data set to ~1/3 of the calls by fixing the number of samples (NS) field and looking at the allele counts, but haven't come up with a good way of measuring the "quality" of the SNP. I have considered using the GN or GO fields, but am not sure how to interpret those values. Any advice would be greatly appreciated. Thank you.
That seems extreme by my experience. I do some filtering of variants in the ambient RNA detection step. The basic idea is that for a good variant, each cluster should have 0%, 50%, or 100% allele fractions (with some wiggle room around 50% for random sampling). But in addition to that, there is ambient RNA which contains the average allele fraction of the whole experiment for that allele. So instead each cluster should have (0+soup)/total, (0.5*(total-soup) + soup)/total, (1*(total-soup)+soup)/total. But if instead we just see each cluster having a small % of the alt allele, it gets filtered.
These variant will have BACKGROUND in the filter field. So I don't remove them, I just mark them as background. Maybe that wasnt the best descriptive word.
Beyond this, variant calling is quite challenging in single cell data especially in mixed sample data. You can filter by allele fraction, but this will remove minority allele true variants. You can look at the GT for each variant and if most are ./. or 0/0 then that is probably also bad.
Thank you for the quick response and advice! I took your advice and got it to ~160k sites. Still pretty high, but that has helped. I will keep at it and hopefully I can figure out why I end up with so many sites.
If you find a good signal to use to filter variants, please post it here and i'll add it to the tool.