gatk
gatk copied to clipboard
SVLEN missing from gCNV output
Hello,
I am running GermlineCNVCaller and PostprocessGermlineCNVCalls (GATK v4.2.5) for CNV analysis on our targeted capture.
My output segment vcfs have no SVLEN or SVTYPE values although those are described in their headers.
Info from header includes:
##INFO=<ID=AC_Orig,Number=A,Type=Integer,Description="Original AC">
##INFO=<ID=AF_Orig,Number=A,Type=Float,Description="Original AF">
##INFO=<ID=AN_Orig,Number=1,Type=Integer,Description="Original AN">
##INFO=<ID=END,Number=1,Type=Integer,Description="End coordinate of the variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
But the actual vcf output only has the END variable.
Example output:
13 32839931 CNV_13_32839931_32945267 N . 3076.53 . END=32945267 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:63:169:3077:523:342
13 32950659 CNV_13_32950659_32954345 N <DEL> 3076.53 . END=32954345 GT:CN:NP:QA:QS:QSE:QSS 0/1:1:7:709:3077:709:831
13 32968699 CNV_13_32968699_73961012 N . 3076.53 . END=73961012 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:14:210:3077:295:630
14 24883828 CNV_14_24883828_94854954 N . 3076.53 . END=94854954 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:72:100:3077:287:299
15 32992921 CNV_15_32992921_91535389 N . 3076.53 . END=91535389 GT:CN:NP:QA:QS:QSE:QSS 0/0:2:35:102:3077:198:331
Commands running are below:
docker run -v /home/dnanexus/inputs:/data $GATK_image gatk GermlineCNVCaller \
-L /data/beds/filtered.interval_list -imr OVERLAPPING_ONLY \
--annotated-intervals /data/beds/annotated_intervals.tsv \
--run-mode COHORT \
$batch_input \
--contig-ploidy-calls /data/ploidy-dir/ploidy-calls/ \
--output-prefix CNV \
-O /data/gCNV-dir
parallel --jobs 8 '/usr/bin/time -v docker run -v /home/dnanexus/inputs:/data $GATK_image \
gatk PostprocessGermlineCNVCalls \
--sample-index {} \
--autosomal-ref-copy-number 2 \
--allosomal-contig X \
--allosomal-contig Y \
--contig-ploidy-calls /data/ploidy-dir/ploidy-calls \
--calls-shard-path /data/gCNV-dir/CNV-calls \
--model-shard-path /data/gCNV-dir/CNV-model \
--output-genotyped-intervals /data/vcfs/sample_{}_intervals.vcf \
--output-genotyped-segments /data/vcfs/sample_{}_segments.vcf \
--output-denoised-copy-ratios /data/vcfs/sample_{}_denoised_copy_ratios.tsv
Are there any options to output SVLEN or is it expected to be in the vcf output?
Many thanks,
Adriana :)
Bump - I've added a post-processing stage to our in-house pipeline which implements this tool. Strange to have SVLEN in the header, but not the data rows. Could we have this added to the output by default?