mychem.info icon indicating copy to clipboard operation
mychem.info copied to clipboard

Trouble linking aeolus compounds

Open cbizon opened this issue 6 years ago • 3 comments

If I do a simple query q=siltuximab, I get 5 results, with these identifiers and keys:

57894-421 ['_id', '_score', 'ndc']
57894-420 ['_id', '_score', 'ndc']
CHEMBL1743070 ['_id', '_score', 'chembl', 'drugcentral']
DB09036 ['_id', '_score', 'drugbank']
T4H8FMA7IM ['_id', '_score', 'aeolus', 'unii']

The way I actually want to query this data is by asking for compounds that have a particular aeolus outcome. So if I come in and query for a particular outcome, and it matches siluximab, I will get back only aeolus and unii information. I won't get chembl or drugcentral, making it hard to give this compound an identifier that I can integrate other data with.

I don't know if this is a general feature or if I just found one, but it seemed in testing that I often didn't get either a chembl or chebi node when querying by aeolus.

cbizon avatar Jun 27 '18 15:06 cbizon

@newgene @andrewsu This is indeed a case regarding how we merge MyChem.info docs. By default, we merge docs based on the InchiKey. However, in the case of 'siltuximab', it's a peptide without available InchiKey. The 5 results return when making queries like http://mychem.info/v1/query?q=siltuximab all refers to the same drug. But it's shown as 5 separate docs in MyChem.info. Potential solution is to group them based on drugname when InchiKey is not available.

kevinxin90 avatar Jun 27 '18 18:06 kevinxin90

That's true. We are working on an id mapping utility function to merge these docs into one. Essentially when InchiKey is not available, we will use a priority list to define the primary key ("_id" field), e.g. drugbank id would be preferred, then chebi, then chembl, etc. As long as we keep this priority order consistent for all data sources in mychem.info, different sources can still be merged even when InchiKey is not available.

We are undergoing a major refactoring of mychem.info, these issues are on our list to be fixed.

newgene avatar Jun 27 '18 22:06 newgene

@cbizon It took me a while to understand what you are asking for. Does this query solve your problem?

http://mychem.info/v1/query?q=aeolus.outcomes.name:Hostility

greg-k-taylor avatar Jan 03 '19 19:01 greg-k-taylor