gatk
gatk copied to clipboard
funcotator hg19 database 1.6 vs 1.7 - EGFR annotated with wrong ENST in 1.7 , but correct in 1.6
Dear *, I just updated the funcotator database for hg19 from 1.6 to 1.7. Here I observed for EGFR a wrong ENST assignment. I wanted funcotator to use ENST00000275493, which corresponds to NM_005228.3 (oncokb.org). I therefore listed the ENST in --transcript-list file, when calling funcotator with --ref-version hg19. For some reason it works with the 1.6 .database, but not with the 1.7 database. Here I receive the correct NM_ID in the MAF file, but another ENST number and therefore other protein variants. The ENST I now got is ENST00000455089.
Kind regards, Daniel
I have an example for FGFR3, when using the --transcript-selection-mode ALL parameter.
I just used the .maf output and selected the columns like this (output at the end):
awk -F "\t" '{print $1";"$36";"$154";"$9";"$10";"$35";"$38";"$42";"$80";"$81";"$82";https://www.ncbi.nlm.nih.gov/snp/?term="$14}' $1 | grep -v "^#\|Silent"
The ENST00000260795 is not listed, although it is present in the default TranscriptFile.txt that comes with funcotator DB 1.7. I am also puzzled that every ENST has the same NM_ID. Do I use the wrong column here?
FGFR3;ENST00000613647.4_1;NM_000142;3'UTR;SNP;g.chr4:1807894G>A;15;;0.999;654;0;https://www.ncbi.nlm.nih.gov/snp/?term=7688609 FGFR3;ENST00000481110.6_4;NM_000142;3'UTR;DEL;g.chr4:1809111_1809112delTG;17;;0.093;64;625;https://www.ncbi.nlm.nih.gov/snp/?term=754209375 FGFR3;ENST00000440486.7_2;NM_000142;3'UTR;DEL;g.chr4:1809111_1809112delTG;18;;0.093;64;625;https://www.ncbi.nlm.nih.gov/snp/?term=754209375 FGFR3;ENST00000340107.8_1;NM_000142;3'UTR;DEL;g.chr4:1809111_1809112delTG;18;;0.093;64;625;https://www.ncbi.nlm.nih.gov/snp/?term=754209375 FGFR3;ENST00000613647.4_1;NM_000142;3'UTR;DEL;g.chr4:1809111_1809112delTG;19;;0.093;64;625;https://www.ncbi.nlm.nih.gov/snp/?term=754209375 FGFR3;ENST00000412135.6_1;NM_000142;3'UTR;DEL;g.chr4:1809111_1809112delTG;16;;0.093;64;625;https://www.ncbi.nlm.nih.gov/snp/?term=754209375 FGFR3;ENST00000481110.6_4;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGTGT;17;;0.101;73;15;https://www.ncbi.nlm.nih.gov/snp/?term=776445794 FGFR3;ENST00000440486.7_2;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGTGT;18;;0.101;73;15;https://www.ncbi.nlm.nih.gov/snp/?term=776445794 FGFR3;ENST00000340107.8_1;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGTGT;18;;0.101;73;15;https://www.ncbi.nlm.nih.gov/snp/?term=776445794 FGFR3;ENST00000613647.4_1;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGTGT;19;;0.101;73;15;https://www.ncbi.nlm.nih.gov/snp/?term=776445794 FGFR3;ENST00000412135.6_1;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGTGT;16;;0.101;73;15;https://www.ncbi.nlm.nih.gov/snp/?term=776445794 FGFR3;ENST00000481110.6_4;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGT;17;;0.889;588;15;https://www.ncbi.nlm.nih.gov/snp/?term=34562534|1491318682|796894443 FGFR3;ENST00000440486.7_2;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGT;18;;0.889;588;15;https://www.ncbi.nlm.nih.gov/snp/?term=34562534|1491318682|796894443 FGFR3;ENST00000340107.8_1;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGT;18;;0.889;588;15;https://www.ncbi.nlm.nih.gov/snp/?term=34562534|1491318682|796894443 FGFR3;ENST00000613647.4_1;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGT;19;;0.889;588;15;https://www.ncbi.nlm.nih.gov/snp/?term=34562534|1491318682|796894443 FGFR3;ENST00000412135.6_1;NM_000142;3'UTR;DEL;g.chr4:1809128_1809131delGT;16;;0.889;588;15;https://www.ncbi.nlm.nih.gov/snp/?term=34562534|1491318682|796894443 FGFR3;ENST00000481110.6_4;NM_000142;3'UTR;SNP;g.chr4:1809787C>T;17;;0.999;742;1;https://www.ncbi.nlm.nih.gov/snp/?term=3135904 FGFR3;ENST00000440486.7_2;NM_000142;3'UTR;SNP;g.chr4:1809787C>T;18;;0.999;742;1;https://www.ncbi.nlm.nih.gov/snp/?term=3135904 FGFR3;ENST00000340107.8_1;NM_000142;3'UTR;SNP;g.chr4:1809787C>T;18;;0.999;742;1;https://www.ncbi.nlm.nih.gov/snp/?term=3135904 FGFR3;ENST00000613647.4_1;NM_000142;3'UTR;SNP;g.chr4:1809787C>T;19;;0.999;742;1;https://www.ncbi.nlm.nih.gov/snp/?term=3135904 FGFR3;ENST00000412135.6_1;NM_000142;3'UTR;SNP;g.chr4:1809787C>T;16;;0.999;742;1;https://www.ncbi.nlm.nih.gov/snp/?term=3135904
Anyone got a hint for me? Kind regards, Daniel
Ok so basically the FGFR3 region chr4:1807894 should be in the CDS in HG19, which I set as parameter. In HG38 it is in the 3'UTR, which correlates to the findings of funcotator (see above).
Hi there, I have another example:
BRAF;ENST00000288602.11_3;NM_004333;Missense_Mutation;SNP;g.chr7:140453136A>T;4;c.1919T>A;p.V640E;0.323;197;413;https://www.ncbi.nlm.nih.gov/snp/?term=113488022
This should be V600E, also according to IGV when I check the genome position. The ENST version (11) seems to belong to HG38, I assume...
Any idea anyone?
@DanielAmsel The genomic regions Funcotator uses are based on the Genocde GTF files. Sometime before Gencode v34 (the version used in the Funcotator v1.7 pre-bundled datasources) Gencode stopped natively creating gene annotations for HG19. The solution for them was to liftover their annotations from HG38. When the version 1.7 datasources release was created, we updated the Gencode datasources to use the latest and greatest at the time, and the only resource available was the lifted over files (full gencode releases live here).
I believe what you're seeing is a result of this liftover.
Dear @jonn-smith , thank you for your answer. I have upgraded to HG38 and it works fine now. Best, Daniel