gatk icon indicating copy to clipboard operation
gatk copied to clipboard

ReblockGVCF bug: first position on contig missing

Open ldgauthier opened this issue 2 years ago • 2 comments

Bug Report

Affected tool(s) or class(es)

ReblockGVCF

Affected version(s)

  • [x] Latest public release version [4.2.6.1]
  • [x] Latest master branch (probably)

Description

First position on a contig can be missing if that position is low quality in the input GVCF

Steps to reproduce

Run Reblock GVCF with the following parameters: --floor-blocks true --gvcf-gq-bands 20 --gvcf-gq-bands 30 --gvcf-gq-bands 40 --do-qual-score-approximation true --variant $inputVC -R $hg38

where inputVC contains chr13 18173860 . A C,<NON_REF> 0 . AS_RAW_BaseQRankSum=|-4.9,1|NaN;AS_RAW_MQ=51256.00|4709.00|0.00;AS_RAW_MQRankSum=|0.5,1|NaN;AS_RAW_ReadPosRankSum=|1.2,1|NaN;AS_SB_TABLE=23,2|1,1|0,0;BaseQRankSum=-4.896;DP=28;ExcessHet=0.0000;MLEAC=0,0;MLEAF=0.00,0.00;MQRankSum=0.564;RAW_MQandDP=59565,28;ReadPosRankSum=1.252 GT:AD:DP:GQ:PL:SB 0/0:25,2,0:27:61:0,61,946,75,951,965:23,2,1,1 appears to be dropped in output

Full input GVCF at gs://broad-dsde-methods-gauthier/reblocking-bug/

Expected behavior

QUAL 0 VC should be replaced with a GQ0 reference block

Actual behavior

Output GVCF is missing position chr13:18173860 and fails ValidateVCF task

ldgauthier avatar Jun 08 '22 15:06 ldgauthier

See also https://broadinstitute.atlassian.net/browse/PO-47848

ldgauthier avatar Jun 08 '22 15:06 ldgauthier

https://broadinstitute.atlassian.net/browse/PO-51350 shows the same outcome; the reblocked GVCF is missing the first variant of chromosome 12.

ReblockGVCF was run with the following options: -do-qual-approx --floor-blocks -GQB 20 -GQB 30 -GQB 40

The site that was dropped: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NWD831644 chr12 10001 . C T,<NON_REF> 0 . DP=10;ExcessHet=3.0103;MLEAC=0,0;MLEAF=0,0;RAW_MQ=13462 GT:AD:DP:GQ:PL:SB 0/0:7,0,0:7:23:0,23,130,23,130,130:5,2,0,0

QUAL = 0, but GQ is 23; it was called as a T non-ref, but the GT is 0/0, so this might be some upstream bug? Perhaps from haplotype caller? Maybe someone with more experience will know how to handle cases like this.

tmelman avatar Aug 01 '22 19:08 tmelman