gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Handling low bases in AD calculation

Open pogodina-nadezda opened this issue 1 year ago • 1 comments

Hi,

I found that even after adding --min-base-quality-score 20 to the HaplotypeCaller command line, low-quality bases are still being counted in the AD field in GVCF.

gatk version: 4.6.0.0 (GATK4 docker image) command: gatk HaplotypeCaller -R hg38.chr17.fna -I chr17.bqsr.hg38.bam -O chr17.g.vcf.gz --dbsnp dbSNP.hg38.vcf.gz -ploidy 2 --max-alternate-alleles 2 --dont-use-soft-clipped-bases --min-base-quality-score 20 --base-quality-score-threshold 20 --minimum-mapping-quality 10 -ERC GVCF

Bases quality in one position (samtools mpileup result): BQSR bam file (input): chr17 3648932 N 47 GaGGcGgcGGcGcGGGcGcccaaaaGGGGGcaGGcGcgcGtcccacg ?5???!'??9+?+999!9!!+55559?999+!?9!!!E+9!!!!5!S BAMOUT: chr17 3648932 N 50 GGGGGGGGGGGGGGGGGGGGcctcaacccccaacggcacccaccagaccg !!?!!!?!!?!?!??5!!!?!!!+55!!!+!55?''+5!?!!++5BBB!I

Line in GVCF: chr17 3648932 rs1555561049 G A,C,<NON_REF> 90.64 . BaseQRankSum=-0.419;DB;DP=53;ExcessHet=0.0000;MLEAC=0,1,0;MLEAF=0.00,0.500,0.00;MQRankSum=0.000;RAW_MQandDP=190800,53;ReadPosRankSum=0.352 GT:AD:DP:GQ:PL:SB 0/2:23,7,16,0:46:24:98,24,381,0,241,336,161,402,378,539:20,3,0,23

I expected the DP to be 28 or 20 according to the bamout, and the AD for Cytosine to not exceed 3. Is it expected behaviour that all bases are counted in the AD, regardless of quality?

pogodina-nadezda avatar Oct 21 '24 09:10 pogodina-nadezda