biomappings
biomappings copied to clipboard
Import DrugBank-DrugCentral mappings
This PR imports the 3,960 mappings between molecules in DrugBank and DrugCentral that were predicted through exact string matches, manually reviewed by @caufieldjh, and stored in http://kg-hub-public-data.s3.amazonaws.com/frozen_incoming_data/drug-id-maps-0.2.sssom.tsv. Some notes:
- DrugCentral itself only provides CAS mappings in addition to structures as SMILES/InChI.
- This file contains novel mappings between DrugCentral and several other controlled vocabularies (ChEBI, ChEMBL Compound, Therapeutic Target Database Drugs, PharmGKB). This PR starts with DrugBank as a proof of concept. DrugBank does not contain any primary mappings to DrugCentral, as far as I can tell. The import script can be trivially updated to import some/the rest.
- Some provenance information about what files were used to generate these mappings didn't seem strictly necessary and are not propagated through to the Biomappings file.
- I used
pyobo
to add in missing labels
Update this PR now filters out drugcentral-drugbank mappings that are already available by querying DrugCentral's postgres database
Thanks @cthoyt ! Let me know if/when the ingests hit any snags so I can fix the table
Hi @cthoyt and @caufieldjh, thanks for working on this! Generally, Biomappings only includes mappings that aren't provided by any of the primary sources. I spot checked 10 entries from the new additions to mappings.tsv
and in each case, I found that DrugCentral already lists the given DrugBank ID on the drug's landing page. For instance, taking the first new entry:
drugbank DB00001 Lepirudin skos:exactMatch drugcentral 2995 lepirudin
https://drugcentral.org/drugcard/2995 provides:
Ideally, these existing mappings would be filtered out and only novel/missing mappings added.
For these sources, here's the relationships for mapping availability:
Source/target | CHEBI | CHEMBL | DrugBank | DrugCentral | PharmGKB.drug | ttd.drug |
---|---|---|---|---|---|---|
CHEBI to | Yes | No | No | No | No | |
CHEMB to | Yes | Yes | Yes | Yes | No | |
DrugBank to | Yes | Yes | No | Yes | Yes | |
DrugCentral to | Yes | Yes | Yes | No | No | |
PharmGKB.drug to | Yes | No | Yes | No | Yes | |
ttd.drug to | Yes | No | No | No | No |
where "Yes" means at least some mappings are available. It doesn't take completeness into account or that many of these are one-to-many or many-to-one since sources vary in size and specificity.