gatk icon indicating copy to clipboard operation
gatk copied to clipboard

SVLEN missing from gCNV output

Open Addy81 opened this issue 3 years ago • 1 comments

Hello,

I am running GermlineCNVCaller and PostprocessGermlineCNVCalls (GATK v4.2.5) for CNV analysis on our targeted capture.

My output segment vcfs have no SVLEN or SVTYPE values although those are described in their headers.

Info from header includes:

##INFO=<ID=AC_Orig,Number=A,Type=Integer,Description="Original AC">
##INFO=<ID=AF_Orig,Number=A,Type=Float,Description="Original AF">
##INFO=<ID=AN_Orig,Number=1,Type=Integer,Description="Original AN">
##INFO=<ID=END,Number=1,Type=Integer,Description="End coordinate of the variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">

But the actual vcf output only has the END variable.

Example output:

13	32839931	CNV_13_32839931_32945267	N	.	3076.53	.	END=32945267	GT:CN:NP:QA:QS:QSE:QSS	0/0:2:63:169:3077:523:342
13	32950659	CNV_13_32950659_32954345	N	<DEL>	3076.53	.	END=32954345	GT:CN:NP:QA:QS:QSE:QSS	0/1:1:7:709:3077:709:831
13	32968699	CNV_13_32968699_73961012	N	.	3076.53	.	END=73961012	GT:CN:NP:QA:QS:QSE:QSS	0/0:2:14:210:3077:295:630
14	24883828	CNV_14_24883828_94854954	N	.	3076.53	.	END=94854954	GT:CN:NP:QA:QS:QSE:QSS	0/0:2:72:100:3077:287:299
15	32992921	CNV_15_32992921_91535389	N	.	3076.53	.	END=91535389	GT:CN:NP:QA:QS:QSE:QSS	0/0:2:35:102:3077:198:331

Commands running are below:

docker run -v /home/dnanexus/inputs:/data $GATK_image gatk GermlineCNVCaller \
        -L /data/beds/filtered.interval_list -imr OVERLAPPING_ONLY \
        --annotated-intervals /data/beds/annotated_intervals.tsv \
        --run-mode COHORT \
        $batch_input \
        --contig-ploidy-calls /data/ploidy-dir/ploidy-calls/ \
        --output-prefix CNV \
        -O /data/gCNV-dir

parallel --jobs 8 '/usr/bin/time -v docker run -v /home/dnanexus/inputs:/data $GATK_image \
        gatk PostprocessGermlineCNVCalls \
        --sample-index {} \
        --autosomal-ref-copy-number 2 \
        --allosomal-contig X \
        --allosomal-contig Y \
        --contig-ploidy-calls /data/ploidy-dir/ploidy-calls \
        --calls-shard-path /data/gCNV-dir/CNV-calls \
        --model-shard-path /data/gCNV-dir/CNV-model \
        --output-genotyped-intervals /data/vcfs/sample_{}_intervals.vcf \
        --output-genotyped-segments /data/vcfs/sample_{}_segments.vcf \
        --output-denoised-copy-ratios /data/vcfs/sample_{}_denoised_copy_ratios.tsv
       

Are there any options to output SVLEN or is it expected to be in the vcf output?

Many thanks,

Adriana :)

Addy81 avatar Jul 27 '22 15:07 Addy81

Bump - I've added a post-processing stage to our in-house pipeline which implements this tool. Strange to have SVLEN in the header, but not the data rows. Could we have this added to the output by default?

MattWellie avatar Feb 27 '24 02:02 MattWellie