graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

Minimum read length needed for genotype_sv?

Open martinghunt opened this issue 3 years ago • 5 comments

I've been running tests on various data sets, and have had a few cases where graphtyper genotype_sv makes VCFs with almost every site having zero depth and a genotypes of "./.". Whereas the identical command for graphtyper genotype results in the calls I would expect. Example output:

NC_000962.3 558 NC_000962.3:558:SG G T 0.0 LowQUAL ABHet=-1;ABHom=-1;AC=0;AF=0;AN=0;CR=0;LOGF=nan;MQ=255;MaxAAS=0;MaxAASR=0;NHet=0;NHomAlt=0;NHomRef=0;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=0;RefLen=1;SB=-1;SBAlt=-1;SBF=0,0;SBF1=0,0;SBF2=0,0;SBR=0,0;SBR1=0,0;SBR2=0,0;SeqDepth=0;VarType=SG GT:AD:MD:DP:GQ:PL ./.:0,0:0:0:0:0,0,0

The only common thing I can find between my data sets is that this happens precisely on my 75bp reads data sets, never on my 100bp or longer reads. Is this expected? If so, are there any options that would get it working with 75bp reads?

martinghunt avatar Nov 16 '20 14:11 martinghunt

Hello, normally the minimum read length for paired end reads is 63 bp and I don't remember setting a stricter criteria on the read length in genotype_sv but maybe I did.. can you share with me a small example sam/bam so I can reproduce the problem?

Also, since your example is a SNP, I don't recommend calling SNPs or indels in genotype_sv, only SVs. Use graphtyper genotype for calling snp/indels.

Best, Hannes

hannespetur avatar Nov 17 '20 11:11 hannespetur

I'll prepare a minimal test case.

So our use case is we have a mix of all our variants together (SNPs, indels, SVs...). How would you recommend to use genotype and genotype_sv to make a call set in that case? We ultimately want a single final VCF that contains everything.

martinghunt avatar Nov 17 '20 12:11 martinghunt

I would call both genotype (SNP indels only) and genotype_sv (SVs only) and create a final VCF with everything using bcftools concat + bcftools sort

hannespetur avatar Nov 17 '20 12:11 hannespetur

ok, thanks. How do you define SV? Any VCF record where length REF or length ALT is >=50?

martinghunt avatar Nov 17 '20 13:11 martinghunt

GraphTyper considers it an SV if the absolute size difference between REF and ALT is >= 50bp. This is done after splitting the variants into biallelic variants if they were multiallelic.

hannespetur avatar Nov 17 '20 14:11 hannespetur