SV2 icon indicating copy to clipboard operation
SV2 copied to clipboard

False positive findings of de novo calling

Open WeiCSong opened this issue 4 years ago • 0 comments

Dear Dan Thank you for your excellent tool. I'm analyzing WGS data for a trio and wish to get the de novo SVs. I ran sv2 separately for each person like:

sv2 -i WOC5_3.final.bam -b cnmops.bed delly.bed manta.bed lumpy.bed -snv WOC5_3.sentieon.snp.vcf.gz -p OC5_3.ped -o WOC5_3 -merge -M"

and got genotype data in .vcf files (WOC5_3 is the ID for child). I got ~500 SV for each person, and i wrote a simple script to extract those SVs that appeared only in child. However, this gave me ~250 de novo SVs, which were apparantly wrong because de novo SVs are extremely rare (~0.1% of total SVs). I guess i misunderstood the genotype matrix and i wish to learn from you about the right procedure.

a related question is about the relation between false positive de novo calling and filtration steps. In my understanding, if you apply strict filtration on individual level, you'll get more false positive de novo calling. For example, if mother gives her child one SV, and our filtration is too strict so that we do not find this SV on mother, we will recognize this SV on child as de novo, and this is a false positive finding. So i'm feeling puzzle about the strict "DENOVO_FILTER" option in SV2. Could you help me to understand the filtration steps? Thanks in advance for your help!

Best Regards Weichen Song

WeiCSong avatar Dec 31 '19 05:12 WeiCSong