mychem.info icon indicating copy to clipboard operation
mychem.info copied to clipboard

Examples of non-ideal merging of records

Open andrewsu opened this issue 1 year ago • 1 comments

The merging of multiple records in source databases into a single record in mychem.info is a challenging process, and one where I doubt we'll ever get it perfectly "right". Having said that, I noticed an example where the current merging is not ideal, and so I'm creating this issue to document this example and others like it.

This is the API call that illustrates this example: https://mychem.info/v1/chem/GVJHHUAWPYXKBD-IEOSBIPESA-N?fields=chembl.molecule_chembl_id,chembl.max_phase,chembl.pref_name,drugcentral.xrefs.chembl_id

{
  "_id": "GVJHHUAWPYXKBD-IEOSBIPESA-N",
  "_version": 1,
  "chembl": {
    "_license": "http://bit.ly/2KAUCAm",
    "max_phase": 0,
    "molecule_chembl_id": "CHEMBL47",
    "pref_name": "VITAMIN E"
  },
  "drugcentral": [
    {
      "_license": "http://bit.ly/2SeEhUy",
      "xrefs": {
        "chembl_id": [
          "CHEMBL3989727",
          "CHEMBL2108106"
        ]
      }
    },
    {
      "_license": "http://bit.ly/2SeEhUy",
      "xrefs": {
        "chembl_id": [
          "CHEMBL3989727",
          "CHEMBL47"
        ]
      }
    }
  ]
}

mychem only maps this record to a single ChEMBL ID -- CHEMBL47, but DrugCentral maps to two additional IDs: CHEMBL3989727 and CHEMBL2108106. All of these IDs are some variant of Vitamin E. One reason this is confusing because CHEMBL47 reports "max_phase": 0, whereas the other two are "max_phase": 4 (what one would expect for Vitamin E).

andrewsu avatar Jun 26 '23 16:06 andrewsu