mygene.info
mygene.info copied to clipboard
query API returns empty summary on some genes
We are using this function ( https://mygene.info/v3/query?q=symbol:POLA2&size=1&species=human&fields=name,summary ) to get the summary of gene POLA2, the call returns no summary. However, if we search this gene in NCBI, we can see there is a summary for this gene on this page https://www.ncbi.nlm.nih.gov/gene/23649. Does Mygene.info pull the summary value from NCBI or from another data source? Query on DRG1 has the similar issue.
@jingjingbic thanks for reporting this to us. We did some investigation and found out why this happens.
The summary
field in MyGene.info was obtained from NCBI's refseq records.
For example, summary
of gene CDK2 comes from NM_001798 (under "COMMENT" section, starts with "Summary:")
This works for pretty much all genes in the past, however, as you pointed out, we now start to see some gene summary values are not coming from the corresponding refseq record.
I think there could be two reasons:
-
There is some delay for NCBI to include summary to some RefSeq records (or potentially could be a mistake too). In this case we will just wait for RefSeq to update. MyGene.info keeps synced very closely with NCBI, once RefSeq is updated (current release 213), MyGene.info should pick up the updates in a week or so.
-
It's likely NCBI has another place to store some gene summary data, in addition to RefSeq records. We cannot locate where the summary of gene POLA2 is from all the data files we synced with NCBI. We will have to reach out to NCBI on this.
Either way, looks like this is something we should double check with NCBI. Depending on their response, we can decide whether any changes are needed on MyGene.info side.
We contacted NCBI helpdesk and confirmed this:
We currently do not add the summaries imported from the Alliance of Genome Resources onto the RefSeq transcript records. Summaries are also not added to model RefSeqs.
Instead of Refseq records, the complete set of gene summary text are available from NCBI's ASN1 binary dump files. We can modify our pipeline to extract gene summary from these files instead. A separate issue #130 was created for this task.
Temporary fix to human genes is done.