mychem.info icon indicating copy to clipboard operation
mychem.info copied to clipboard

New Data Source: GSRS

Open newgene opened this issue 9 months ago • 1 comments

URL: https://gsrs.ncats.nih.gov

It provides a downloadable .gsrs file. And this file is essentially a compressed 7-zip file with a list of JSON objects.

NOTE: GSRS resource is likely a successor of the previous GINAS resource (https://ginas.ncats.nih.gov redirects to https://gsrs.ncats.nih.gov now). We can include both gsrs and ginas for now, and can remove ginas when we don't need it any more.

newgene avatar May 09 '24 22:05 newgene

The JSON object does not seem including inchi or inchikey field (smiles field available though), still confirming it with the GSRS team.

newgene avatar May 09 '24 22:05 newgene

We confirmed with the GSRS team that inchikey was calculated based on the smiles field.

The KNIME workflow has an example on how to calculate inchikeys from GSRS using RDKit nodes, if that helps. https://hub.knime.com/-/spaces/-/~8MCL_tgTaY7uA37U/current-state/

In this case, we might just use our existing mapping from smiles to inchikey at MyChem.info to get the inchikey value as the primary _id key.

newgene avatar Jun 13 '24 17:06 newgene

@newgene each record signifies either a chemical, concept, polymer, nucleic acid, protein, mixture, substance group, or diverse. Only chemicals and polymers have SMILES and InChI. What should each record in our API signify?

PS this dashboard is useful for data exploration

NeuralFlux avatar Jun 21 '24 17:06 NeuralFlux