mychem.info icon indicating copy to clipboard operation
mychem.info copied to clipboard

PharmGKB parsing fixes

Open newgene opened this issue 11 months ago • 5 comments

As shown in this example:

https://mychem.info/v1/query?q=bromazin&fields=pharmgkb

image
  • generic_names & trade_names fields should be a list
  • dosing_guideline: check if it can be a True/False boolean field
  • fields under xrefs field seem not parsing correctly
  • check band_mixtures field to see if it should also be a list

newgene avatar Mar 12 '24 21:03 newgene

I have 2 comparisons for you here, the first one is the same record you linked above and another for different values.

Comparison for https://mychem.info/v1/query?q=bromazin&fields=pharmgkb

{
    "_id": "PA164760854",
    "pharmgkb": {
        "id": "PA164760854",
        "name": "bromodiphenhydramine",
        "generic_names": [
            "Ambodryl Hydrochloride",
            "Amodryl",
            "Bromanautine",
            "Bromazin",
            "Bromazine",
            "Bromazine hydrochloride",
            "Bromdiphenhydramine",
            "Bromdiphenhydramine hydrochloride",
            "Bromdiphenylhydramine Hydrochloride",
            "Bromodiphenhydramine hydrochloride",
        ],
        "trade_names": [
            "Bromo-Benadryl",
            "Bromo-Benzdryl",
            "Deserol",
            "Histabromamine",
            "Neo-Benadryl",
        ],
        "brand_mixtures": "Ambenyl Cough Syrup (Ammonium Chloride + Bromodiphenhydramine Hydrochloride + Codeine Phosphate + Diphenhydramine Hydrochloride + Potassium Guaiacol Sulphonate)",
        "type": "Drug",
        "xrefs": {
            "chebi": "CHEBI:59177",
            "chemspider": "2350",
            "dpd": "00469122",
            "drugbank": "DB01237",
            "pubchem": {"sid": 544844},
            "ttd": "DAP001072",
            "atc": "R06AA01",
            "mesh": "C011410",
            "ndf_rt": "N0000147733",
            "rxnorm": "19759",
            "umls": "C0054120",
        },
        "smiles": "CN(C)CCOC(C1=CC=CC=C1)C2=CC=C(C=C2)Br",
        "inchi": "InChI=1S/C17H20BrNO/c1-19(2)12-13-20-17(14-6-4-3-5-7-14)15-8-10-16(18)11-9-15/h3-11,17H,12-13H2,1-2H3",
        "dosing_guideline": False,
    },
}

generic_names and trade_names are now a list. dosing_guideline is now True/False boolean based on yes/no updated xref parsing brand_mixtures is now a list however only 1 item here, see below example:

Comparison for https://mychem.info/v1/chem/MCGSCOLBFJQGHM-SCZZXKLOSA-N?fields=pharmgkb

{
    "_id": "PA448004",
    "pharmgkb": {
        "id": "PA448004",
        "name": "abacavir",
        "generic_names": ["ABC", "abacavir"],
        "trade_names": ["Epzicom", "Ziagen"],
        "brand_mixtures": [
            "Kivexa (Abacavir Sulfate + Lamivudine)",
            "Trizivir (Abacavir Sulfate + Lamivudine + Zidovudine)",
        ],
        "type": "Drug",
        "xrefs": {
            "cas": "188062-50-2",
            "chebi": "CHEBI:2360",
            "chemspider": "58649",
            "clinicaltrials": {"gov": "NCT00373945"},
            "dpd": "02240358",
            "drugbank": "DB01048",
            "dailymed": {"setid": "6a3b10fc-4b2a-45e3-16a1-ef79187a6dfb"},
            "kegg_compound": "C07624",
            "kegg_drug": "D07057",
            "ndc": "0173-0661-01",
            "pubchem": {"sid": 46505718},
            "ttd": "DAP000704",
            "url": "http://en.wikipedia.org/wiki/Abacavir",
            "atc": "J05AR13",
            "mesh": "C106538",
            "ndf_rt": "N0000022135",
            "rxnorm": "190521",
            "umls": "C0663655",
        },
        "smiles": "C1CC1NC2=NC(=NC3=C2N=CN3[C@@H]4C[C@@H](C=C4)CO)N",
        "inchi": "InChI=1S/C14H18N6O/c15-14-18-12(17-9-2-3-9)11-13(19-14)20(7-16-11)10-4-1-8(5-10)6-21/h1,4,7-10,21H,2-3,5-6H2,(H3,15,17,18,19)/t8-,10+/m1/s1",
        "dosing_guideline": True,
    },
}

DylanWelzel avatar Mar 12 '24 23:03 DylanWelzel

Commit https://github.com/biothings/mychem.info/commit/1ee8c290c2e7253c8e2229eba034fd7aea49c8ed fixes this issue, pending a new release.

DylanWelzel avatar Mar 13 '24 17:03 DylanWelzel

@DylanWelzel thanks for the quick fix. I noticed two more things I previously missed:

        "brand_mixtures": [
            "Kivexa (Abacavir Sulfate + Lamivudine)",
            "Trizivir (Abacavir Sulfate + Lamivudine + Zidovudine)",
        ],

can we parse out each mixture string to an object: {"brand_name": ..., "mixture": [...]}?

"clinicaltrials": {"gov": "NCT00373945"},

can you verify if we have other possible value other than "gov" here? We might be able to further simplify this a bit as "clinicaltrials_gov": "NCT00373945"

newgene avatar Mar 13 '24 19:03 newgene

I came across this example here: https://mychem.info/v1/chem/AVKUERGKIZMTKX-NJBDSQKTSA-N?fields=pharmgkb.brand_mixtures.

Synergistin Injectable Suspension (Ampicillin + Sulbactam (Sulbactam Benzathine))

How would we want this parsed into an object? Some possible options:

{
  "brand_name": "Synergistin Injectable Suspension",
  "mixture": ["Ampicillin", "Sulbactam", "Sulbactam Benzathine"]
}

or

{
  "brand_name": "Synergistin Injectable Suspension",
  "mixture": [
    "Ampicillin",
    "Sulbactam (Sulbactam Benzathine)"
  ]
}

DylanWelzel avatar Mar 18 '24 17:03 DylanWelzel

@DylanWelzel 2nd one

newgene avatar Mar 18 '24 17:03 newgene

80c7683cb4b83b1ca5c30c5bcdeb011fab6bbc17 fix is live with the updated changes, example: https://mychem.info/v1/query?q=bromazin&fields=pharmgkb

        "brand_mixtures": {
          "brand_name": "Ambenyl Cough Syrup",
          "mixture": [
            "Ammonium Chloride",
            "Bromodiphenhydramine Hydrochloride",
            "Codeine Phosphate",
            "Diphenhydramine Hydrochloride",
            "Potassium Guaiacol Sulphonate"
          ]
        },

DylanWelzel avatar Jun 17 '24 23:06 DylanWelzel