gatk
gatk copied to clipboard
Handling low bases in AD calculation
Hi,
I found that even after adding --min-base-quality-score 20 to the HaplotypeCaller command line, low-quality bases are still being counted in the AD field in GVCF.
gatk version: 4.6.0.0 (GATK4 docker image) command: gatk HaplotypeCaller -R hg38.chr17.fna -I chr17.bqsr.hg38.bam -O chr17.g.vcf.gz --dbsnp dbSNP.hg38.vcf.gz -ploidy 2 --max-alternate-alleles 2 --dont-use-soft-clipped-bases --min-base-quality-score 20 --base-quality-score-threshold 20 --minimum-mapping-quality 10 -ERC GVCF
Bases quality in one position (samtools mpileup result): BQSR bam file (input): chr17 3648932 N 47 GaGGcGgcGGcGcGGGcGcccaaaaGGGGGcaGGcGcgcGtcccacg ?5???!'??9+?+999!9!!+55559?999+!?9!!!E+9!!!!5!S BAMOUT: chr17 3648932 N 50 GGGGGGGGGGGGGGGGGGGGcctcaacccccaacggcacccaccagaccg !!?!!!?!!?!?!??5!!!?!!!+55!!!+!55?''+5!?!!++5BBB!I
Line in GVCF: chr17 3648932 rs1555561049 G A,C,<NON_REF> 90.64 . BaseQRankSum=-0.419;DB;DP=53;ExcessHet=0.0000;MLEAC=0,1,0;MLEAF=0.00,0.500,0.00;MQRankSum=0.000;RAW_MQandDP=190800,53;ReadPosRankSum=0.352 GT:AD:DP:GQ:PL:SB 0/2:23,7,16,0:46:24:98,24,381,0,241,336,161,402,378,539:20,3,0,23
I expected the DP to be 28 or 20 according to the bamout, and the AD for Cytosine to not exceed 3. Is it expected behaviour that all bases are counted in the AD, regardless of quality?