gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Audit MuTect2 headers to describe fragment vs read annotations

Open GATKSupportTeam opened this issue 2 years ago • 0 comments

This request was created from a contribution made by Brian Wiley on March 18, 2022 22:33 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360073119131-MuTect2-annotations#community_comment_4797484535835

--

394224746911, do you think it would be possible for Broad to start documenting things like this and most of the tags in your VCF files.  More like simple things not like the algorithm for clustered events.  For instance I have variant from which I can get the depth from the bam file with samtools with a variant called by Mutect at position chr17:60663118 C>T.  The depth of my bam file at this position is 170 (after removing duplicates; 222 before removing duplicates which is what shows in IGV).

$ samtools depth C484.TCGA-19-2620-10A-01D-1495-08.5_gdc_realn.bam -r chr17:60663117-60663119

chr17    60663117    172

chr17    60663118    170

chr17    60663119    173

This coincides exactly with IGV (also shows 170 for position 60663118 when remove duplicates is turned on).

$ samtools view C484.TCGA-19-2620-10A-01D-1495-08.5_gdc_realn.bam chr17:60663118-60663118 | wc -l

222

However Mutect has 2 depth DP tags, an INFO/DP and a FMT/DP.  The format DP shows a depth of 163 at this position which makes sense that it's at least lower.  But the INFO/DP shows a depth of DP=318 which makes almost zero sense (that's almost double!!) and in the VCF file it even indicates some reads are filtered...

##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

Additionally the AS_SB_TABLE shows "164,135|3,2" in which the total is 304 which again makes almost zero sense.  So from a researcher perspective it is absolutely necessary to know how Mutect2 is calculating these numbers else the strand bias filter cannot be trusted at all.  I am using version 4.2.1.0 of GATK and below are the variant and the call information:

Variant

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TCGA-19-2620-10A-01D-1495-08

chr17   60663118        .       C       T       .       weak_evidence   AS_FilterStatus=weak_evidence;AS_SB_TABLE=164,135|3,2;DP=318;ECNT=1;GERMQ=93;MBQ=33,27;

MFRL=182,128;MMQ=60,60;MPOS=17;NALOD=1.99;NLOD=28.89;POPAF=6;ROQ=32;TLOD=6.59;AC=1;AN=2;MQ0=0;samtools_DP=170;SAMPLE=TCGA-19-2620-10A-01D-1495-08;PON_RefDepth=

1532;PON_AltDepth=0;PON_FISHER=0;CSQ=T|stop_gained|HIGH|PPM1D|ENSG00000170836|Transcript|ENST00000305921.8|protein_coding|6/6||ENST00000305921.8:c.1384C>T|ENSP

00000306682.2:p.Gln462Ter|1606|1384|462|Q/*|Caa/Taa|CM131995&COSV59955543||1||SNV|HGNC|HGNC:9277|YES|NM_003620.4||1|P1|CCDS11625.1|ENSP00000306682|O15297.184|A

0A0S2Z4M2.32|UPI0000130FE8|O15297-1||Ensembl||C|C||1||||||||||||||||||||||||||0&1|1&1||||||||MAGLYSLGVSVFSDQGGRKYMEDVTQIVVEPEPTAEEKPSPRRSLSQPLPPRPSPAALPGGEVSGK

GPAVAAREARDPLPDAGASPAPSRCCRRRSSVAFFAVCDGHGGREAAQFAREHLWGFIKKQKGFTSSEPAKVCAAIRKGFLACHLAMWKKLAEWPKTMTGLPSTSGTTASVVIIRGMKMYVAHVGDSGVVLGIQDDPKDDFVRAVEVTQDHKPELPKER

ERIEGLGGSVMNKSGVNRVVWKRPRLTHNGPVRRSTVIDQIPFLAVARALGDLWSYDFFSGEFVVSPEPDTSVHTLDPQKHKYIILGSDGLWNMIPPQDAISMCQDQEEKKYLMGEHGQSCAKMLVNRALGRWRQRMLRADNTSAIVICISPEVDNQGN

FTNEDELYLNLTDSPSYNSQETCVMTPSPCSTPPVKSLEEDPWPRVNSKDHIPALVRSNAFSENFLEVSAEIARENVQGVVIPSKDPEPLEENCAKALTLRIHDSLNNSLPIGLVPTNSTNTVMDQKNLKMSTPGQMKAQEIERTPPTNFKRTLEESNS

GPLMKKHRRNGLSRSSGAQPASLPTTSQRKNSVKLTMRRRLRGQKKIGNPLLHQHRKTVCVC||||||||||||||||||||||||||||||    GT:AD:AF:DP:F1R2:F2R1:SB        0/1:158,5:0.031:163:70,2:79,2:8

7,71,3,2

Call (your guys also sequenced this sample for the TCGA since the center is -08 TCGA-19-2620-10A-01D-1495-08):

NORMAL=$(samtools view -H $normal_bam | /usr/bin/perl -nE 'say $1 if /^\@RG.+\tSM:([ -~]+)/' | head -n 1)

TUMOR=$(samtools view -H $tumor_bam | /usr/bin/perl -nE 'say $1 if /^\@RG.+\tSM:([ -~]+)/' | head -n 1)

/gatk/gatk Mutect2 --java-options "-Xmx8g" -O $1 -R $2 -I $3 -tumor "$TUMOR" -I $4 -normal "$NORMAL" -L $5 --f1r2-tar-gz f1r2.tar.gz #Running Mutect2.

/gatk/gatk LearnReadOrientationModel -I f1r2.tar.gz -O artifact.priors.tar.gz

/gatk/gatk FilterMutectCalls --java-options "-Xmx8g" -R $2 -V $1 -O $6 -ob-priors artifact.priors.tar.gz #Running FilterMutectCalls on the output vcf.

(created from Zendesk ticket #278802)
gz#278802

GATKSupportTeam avatar Jun 15 '22 20:06 GATKSupportTeam