gatk
gatk copied to clipboard
Fix bug in FilterMutectCalls 'haplotype' filter by checking PGT and make --linked-de-bruijn-graph default
This request was created from a contribution made by Francesco Mazzarotto on March 23, 2022 14:16 UTC.
--
Hello,
I am using GATK v4.2.5.0 to process tumor-only samples sequenced with WES.
In a sample, one variant that has been detected with Sanger sequencing (chr14-45137087-C-T) gets filtered out as non-PASS (also) because of the 'haplotype' filter value. As far as the 'haplotype' filter value is concerned, the 'guilty' variant seems to be another SNP 3bp upstream (chr14-45137084-C-T). There are no other variants called within 100bp of the Sanger-validated one (see below).
chr14 45136964 . C T . haplotype;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=3,0|1,0;DP=4;ECNT=2;GERMQ=25;MBQ=41,37;MFRL=360,390;MMQ=60,60;MPOS=69;POPAF=7.30;ROQ=17;TLOD=3.20 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:3,1:0.333:4:1,0:1,1:0|1:45136962_C_T:45136962:3,0,1,0
chr14 45137084 . C T . germline;haplotype;panel_of_normals AS_FilterStatus=SITE;AS_SB_TABLE=9,1|12,5;DP=27;ECNT=2;GERMQ=1;MBQ=41,41;MFRL=297,326;MMQ=60,60;MPOS=45;PON;POPAF=0.830;ROQ=90;TLOD=59.93 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:10,17:0.615:27:4,13:4,4:0|1:45137084_C_T:45137084:9,1,12,5
chr14 45137087 . C T . germline;haplotype AS_FilterStatus=SITE;AS_SB_TABLE=12,5|9,1;DP=27;ECNT=2;GERMQ=1;MBQ=41,41;MFRL=326,297;MMQ=60,60;MPOS=44;POPAF=2.33;ROQ=93;TLOD=31.76 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 1|0:17,10:0.385:27:13,4:4,6:1|0:45137084_C_T:45137084:12,5,9,1
chr14 45149295 . AC A . haplotype;weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=0,0|0,0;DP=1;ECNT=2;GERMQ=8;MBQ=0,27;MFRL=0,407;MMQ=60,60;MPOS=15;POPAF=7.30;ROQ=93;TLOD=4.20 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:0,1:0.667:1:0,1:0,0:0|1:45149295_AC_A:45149295:0,0,0,1
However, the two variants have two different PGT tags (0|1 and 1|0). From the --distance-on-haplotype documentation, I gathered that the 'haplotype' filter value could be assigned only to variants with the same PGT and PID tags, within 100bp (default) and already filtered out for other reasons.
The command that I used is the following:
gatk FilterMutectCalls \
-R $RefGenome \
-V $TempSampleDir/$SampleName.unfiltered.vcf \
--tumor-segmentation $TempSampleDir/$SampleName.segments_table \
--contamination-table $TempSampleDir/$SampleName.contamination_table \
--ob-priors $TempSampleDir/$SampleName.tumor_artifact_prior.tar.gz \
-O $TempSampleDir/$SampleName.filtered.vcf
Am I missing something, or is the 'haplotype' filter value mistakenly assigned to these two variants?
(created from Zendesk ticket #277843)
gz#277843
(related to Zendesk ticket #277843)
Hi, regarding making --linked-de-bruijn-graph the default, I wanted to share that I had recently run mutect2 (gatkv4.2.6.1) on a larger cohort of samples with that option, some of which had variant calls from a previous mutect2 run (gatkv4.1.0.0) without this option. I noticed that plenty of known cancer drivers (e.g. KRAS p.G12C or PIK3CA p.E545* or BRAF p.V600*) that were present in a substantial number of samples (>10%ish) in the old calls were completely absent in the new calls. I had to add the option --recover-all-dangling-branches to recover those known hotspot mutations. They also all have very sufficient coverage (O(100)x) and high VAF to make them obvious true positives.
I'd expect from mutect2 to be always able to call presence or absence of known hotspot mutations, so you should either look into further testing/debugging the linked de Bruijn graph option or also make it a default to recover all dangling branches.