biomappings icon indicating copy to clipboard operation
biomappings copied to clipboard

Import DrugBank-DrugCentral mappings

Open cthoyt opened this issue 1 year ago • 3 comments

This PR imports the 3,960 mappings between molecules in DrugBank and DrugCentral that were predicted through exact string matches, manually reviewed by @caufieldjh, and stored in http://kg-hub-public-data.s3.amazonaws.com/frozen_incoming_data/drug-id-maps-0.2.sssom.tsv. Some notes:

  1. DrugCentral itself only provides CAS mappings in addition to structures as SMILES/InChI.
  2. This file contains novel mappings between DrugCentral and several other controlled vocabularies (ChEBI, ChEMBL Compound, Therapeutic Target Database Drugs, PharmGKB). This PR starts with DrugBank as a proof of concept. DrugBank does not contain any primary mappings to DrugCentral, as far as I can tell. The import script can be trivially updated to import some/the rest.
  3. Some provenance information about what files were used to generate these mappings didn't seem strictly necessary and are not propagated through to the Biomappings file.
  4. I used pyobo to add in missing labels

Update this PR now filters out drugcentral-drugbank mappings that are already available by querying DrugCentral's postgres database

cthoyt avatar Nov 17 '22 12:11 cthoyt

Thanks @cthoyt ! Let me know if/when the ingests hit any snags so I can fix the table

caufieldjh avatar Nov 17 '22 13:11 caufieldjh

Hi @cthoyt and @caufieldjh, thanks for working on this! Generally, Biomappings only includes mappings that aren't provided by any of the primary sources. I spot checked 10 entries from the new additions to mappings.tsv and in each case, I found that DrugCentral already lists the given DrugBank ID on the drug's landing page. For instance, taking the first new entry:

drugbank DB00001 Lepirudin skos:exactMatch drugcentral 2995 lepirudin

https://drugcentral.org/drugcard/2995 provides: image

Ideally, these existing mappings would be filtered out and only novel/missing mappings added.

bgyori avatar Nov 17 '22 16:11 bgyori

For these sources, here's the relationships for mapping availability:

Source/target CHEBI CHEMBL DrugBank DrugCentral PharmGKB.drug ttd.drug
CHEBI to Yes No No No No
CHEMB to Yes Yes Yes Yes No
DrugBank to Yes Yes No Yes Yes
DrugCentral to Yes Yes Yes No No
PharmGKB.drug to Yes No Yes No Yes
ttd.drug to Yes No No No No

where "Yes" means at least some mappings are available. It doesn't take completeness into account or that many of these are one-to-many or many-to-one since sources vary in size and specificity.

caufieldjh avatar Nov 18 '22 18:11 caufieldjh