graphtyper
graphtyper copied to clipboard
Minimum read length needed for genotype_sv?
I've been running tests on various data sets, and have had a few cases where graphtyper genotype_sv
makes VCFs with almost every site having zero depth and a genotypes of "./.". Whereas the identical command for graphtyper genotype
results in the calls I would expect. Example output:
NC_000962.3 558 NC_000962.3:558:SG G T 0.0 LowQUAL ABHet=-1;ABHom=-1;AC=0;AF=0;AN=0;CR=0;LOGF=nan;MQ=255;MaxAAS=0;MaxAASR=0;NHet=0;NHomAlt=0;NHomRef=0;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=0;RefLen=1;SB=-1;SBAlt=-1;SBF=0,0;SBF1=0,0;SBF2=0,0;SBR=0,0;SBR1=0,0;SBR2=0,0;SeqDepth=0;VarType=SG GT:AD:MD:DP:GQ:PL ./.:0,0:0:0:0:0,0,0
The only common thing I can find between my data sets is that this happens precisely on my 75bp reads data sets, never on my 100bp or longer reads. Is this expected? If so, are there any options that would get it working with 75bp reads?
Hello, normally the minimum read length for paired end reads is 63 bp and I don't remember setting a stricter criteria on the read length in genotype_sv but maybe I did.. can you share with me a small example sam/bam so I can reproduce the problem?
Also, since your example is a SNP, I don't recommend calling SNPs or indels in genotype_sv
, only SVs. Use graphtyper genotype
for calling snp/indels.
Best, Hannes
I'll prepare a minimal test case.
So our use case is we have a mix of all our variants together (SNPs, indels, SVs...). How would you recommend to use genotype
and genotype_sv
to make a call set in that case? We ultimately want a single final VCF that contains everything.
I would call both genotype
(SNP indels only) and genotype_sv
(SVs only) and create a final VCF with everything using bcftools concat
+ bcftools sort
ok, thanks. How do you define SV? Any VCF record where length REF or length ALT is >=50?
GraphTyper considers it an SV if the absolute size difference between REF and ALT is >= 50bp. This is done after splitting the variants into biallelic variants if they were multiallelic.