mygene.info icon indicating copy to clipboard operation
mygene.info copied to clipboard

Create a new NCBI data source to get complete gene summary from ASN dump

Open newgene opened this issue 1 year ago • 0 comments

The current gene summary data (summary field) from MyGene.info API are extracted from the RefSeq records (see the current refseq data source).

It appears that Refseq does not contain all gene summary text available from NCBI. For example, reported in #129, gene POLA2 contains a summary text which is not available from its RefSeq record, therefore it's missing from the current MyGene.info API.

As suggested by the NCBI support team (Case #: CAS-941135-X3W9H8 for the record), the complete gene summary text are available from NCBI's ASN1 binary dump files. We can create a new ncbi_gene data source based on ASN1 binary dump files to extract gene summary text.

newgene avatar Sep 04 '22 22:09 newgene