mygene.info
mygene.info copied to clipboard
Add associated conditions from phenotypes section of Entrez gene
It would be great to be able to pull associated condition
information from Entrez via mygene.info.
For example, for BRAF (https://www.ncbi.nlm.nih.gov/gene/673):
Under phenotypes they list conditions from the genetic testing registry such as: Cardiofaciocutaneous syndrome 1 Dabrafenib response ... Vemurafenib response
We would really like to pull such information into CIViC along with other critical gene info we already obtain from myvariant.info (e.g. https://civicdb.org/events/genes/5/summary/variants/2826/summary)
I also think this data would be super useful, so I looked into it a bit. Just recording what I found...
Most of our NCBI data comes from https://ftp.ncbi.nlm.nih.gov/gene/DATA/. It looks like those phenotypes come from mim2gene_medgen
. If I search for @malachig's example gene ID 673
, I get the following records:
$ awk '$2==673' mim2gene_medgen
115150 673 phenotype GeneMap CN029449 -
163950 673 phenotype GeneReviews C4551602 -
164757 673 gene - - -
211980 673 phenotype GeneMap C0684249 -
613706 673 phenotype GeneMap C3150970 -
613707 673 phenotype GeneMap C3150971 -
It looks like it got five out of the seven phenotypes listed on https://www.ncbi.nlm.nih.gov/gene/673
The two MedGen IDs that aren't found for gene 673
are not found anywhere in the mim2gene_medgen
file.
$ grep -c CN239586 mim2gene_medgen.txt
0
$ grep -c CN239577 mim2gene_medgen.txt
0
Hmm, not sure what the source is for those two missing ones...
Also checked medgen download, doesn't see such a file.
Unless someone could provide a link to the full file, we probably will just go with mim2gene_medgen.txt.
And FYI, we do have a BioThings API ready which can connect from gene -> disease/phenotype, that's the EBIGene2Phenotype API. For example, you can query by HGNC ID to get associated conditions: https://biothings.ncats.io/ebigene2phenotype/gene/1097