gemini icon indicating copy to clipboard operation
gemini copied to clipboard

HGNC nomenclature

Open nswh opened this issue 6 years ago • 2 comments

There is inconsistency between gene_details/gene_summary GEMINI version 0.21.1 and current HGNC nomenclature. For example, we have gene name ADA2 in the GEMINI SQL query but in the variants table it is CECR1. Furthermore, in gene_details/gene_summary ADA2 is_hgnc=0 and CECR1 is_hgnc=1 However, ADA2 is the current HGNC nomenclature and CECR1 is NOT.

nswh avatar Aug 28 '18 06:08 nswh

Adding a comment on this error:

The gene tables use GRCh37 build of the genome for which the HGNC mapping for the ensembl gene id "ENSG00000093072" is CECR1. The latest GRCh38 build however maps the same ensembl id to the the HGNC symbol "ADA2" and "CECR1" is now a previously used symbol. Hence the discrepancy. Updating the ensemble version (73 to release 95, which i guess is the latest available for GRCh37) would still show the same discrepancy, until the new genome build is adopted for the gene tables.

You can see this here: http://useast.ensembl.org/biomart/martview/98a63008bc0be7d2201e6ea4dd1ba110 http://grch37.ensembl.org/biomart/martview/13f4e0e7743a9e8330cceb517be22621

If the VEP annotations are made using the GRCh37 build, then I guess the tables should follow the same build for proper mapping. If users have already migrated to GRCh38 build for VEP, then new gene tables for this build are required. But I guess we still use the GRCH37/hg19 build for most annotations in GEMINI.

udp3f avatar Jan 15 '19 21:01 udp3f

@udp3f thanks for looking into this. I thought I found in ensembl where it was ADA2 for build 95, but maybe that was another resource. I am not going to get this in for the impending release but if someone else wants to take on updating the gene tables, that would be a great contribution.

brentp avatar Jan 15 '19 23:01 brentp