mygene.info icon indicating copy to clipboard operation
mygene.info copied to clipboard

Add associated conditions from phenotypes section of Entrez gene

Open malachig opened this issue 3 years ago • 2 comments

It would be great to be able to pull associated condition information from Entrez via mygene.info.

For example, for BRAF (https://www.ncbi.nlm.nih.gov/gene/673):

Under phenotypes they list conditions from the genetic testing registry such as: Cardiofaciocutaneous syndrome 1 Dabrafenib response ... Vemurafenib response

We would really like to pull such information into CIViC along with other critical gene info we already obtain from myvariant.info (e.g. https://civicdb.org/events/genes/5/summary/variants/2826/summary)

malachig avatar Oct 29 '20 15:10 malachig

I also think this data would be super useful, so I looked into it a bit. Just recording what I found...

Most of our NCBI data comes from https://ftp.ncbi.nlm.nih.gov/gene/DATA/. It looks like those phenotypes come from mim2gene_medgen. If I search for @malachig's example gene ID 673, I get the following records:

$ awk '$2==673' mim2gene_medgen
115150  673     phenotype        GeneMap        CN029449        -
163950  673     phenotype        GeneReviews    C4551602        -
164757  673     gene    -       -       -
211980  673     phenotype        GeneMap        C0684249        -
613706  673     phenotype        GeneMap        C3150970        -
613707  673     phenotype        GeneMap        C3150971        -

It looks like it got five out of the seven phenotypes listed on https://www.ncbi.nlm.nih.gov/gene/673

image

The two MedGen IDs that aren't found for gene 673 are not found anywhere in the mim2gene_medgen file.

$ grep -c  CN239586 mim2gene_medgen.txt
0
$ grep  -c CN239577 mim2gene_medgen.txt
0

Hmm, not sure what the source is for those two missing ones...

andrewsu avatar Oct 29 '20 16:10 andrewsu

Also checked medgen download, doesn't see such a file.

Unless someone could provide a link to the full file, we probably will just go with mim2gene_medgen.txt.

And FYI, we do have a BioThings API ready which can connect from gene -> disease/phenotype, that's the EBIGene2Phenotype API. For example, you can query by HGNC ID to get associated conditions: https://biothings.ncats.io/ebigene2phenotype/gene/1097

kevinxin90 avatar Oct 29 '20 21:10 kevinxin90