DrugMechDB icon indicating copy to clipboard operation
DrugMechDB copied to clipboard

Node label mismatches in indication_paths.yaml: 28 indications have incorrect Drug/Disease labels

Open justaddcoffee opened this issue 1 month ago • 0 comments

Issue Description

Found 28 indications in indication_paths.yaml where the node labels don't match the actual entity types. This causes issues when trying to programmatically identify drug and disease nodes in the mechanism graphs.

Examples

Drug nodes mislabeled (5 cases)

  1. DB08896_MESH_D006528_1: 'regorafenib' (MESH:C559147) has label='Disease' (should be 'Drug')
  2. DB08903_MESH_D014397_1: 'bedaquiline' (MESH:C493870) has label='Disease' (should be 'Drug')
  3. DB08903_MESH_D018088_1: 'bedaquiline' (MESH:C493870) has label='Disease' (should be 'Drug')
  4. DB11608_MESH_D002836_1: 'eftrenonacog alfa' (MESH:C000599709) has label='Disease' (should be 'Drug')
  5. DB12184_MESH_D003865_2: 'gepirone' (MESH:C039979) has label='Disease' (should be 'Drug')

Disease nodes mislabeled (22 cases)

  1. DB01577_MESH_D001289_1: 'Attention deficit hyperactivity disorder' (MESH:D001289) has label='ChemicalSubstance' (should be 'Disease')
  2. DB04931_MESH_D046351_2: 'Erythropoietic protoporphyria' (MESH:D046351) has label='Drug' (should be 'Disease')
  3. DB09270_MESH_D006333_1: 'Congestive heart failure' (MESH:D006333) has label='Drug' (should be 'Disease')
  4. DB00971_MESH_D014010_1: 'Pityriasis versicolor' (MESH:D014010) has label='Drug' (should be 'Disease')
  5. DB00971_MESH_D012628_1: 'Seborrheic dermatitis' (MESH:D012628) has label='Drug' (should be 'Disease')
  6. DB00527_MESH_D011537_1: 'Itching of skin' (MESH:D011537) has label='Drug' (should be 'Disease')
  7. DB06273_MESH_D005871_1: 'Angiofollicular lymph node hyperplasia' (MESH:D005871) has label='Drug' (should be 'Disease')
  8. DB06273_MESH_D013700_1: 'Giant cell arteritis' (MESH:D013700) has label='Drug' (should be 'Disease')
  9. DB06273_MESH_D001172_1: 'Rheumatoid arthritis' (MESH:D001172) has label='Drug' (should be 'Disease')
  10. DB06273_MESH_D001171_1: 'Juvenile rheumatoid arthritis' (MESH:D001171) has label='Drug' (should be 'Disease')
  11. DB09299_MESH_D003866_1: 'Depressive disorder' (MESH:D003866) has label='Drug' (should be 'Disease')
  12. DB01281_MESH_D001171_1: 'Juvenile idiopathic arthritis' (MESH:D001171) has label='Drug' (should be 'Disease')
  13. DB01281_MESH_D001172_1: 'Rheumatoid arthritis' (MESH:D001172) has label='Drug' (should be 'Disease')
  14. DB06168_MESH_D056587_1: 'Cryopyrin associated periodic syndrome' (MESH:D056587) has label='Drug' (should be 'Disease')
  15. DB06168_MESH_C536657_1: 'TNF receptor-associated periodic fever syndrome (TRAPS)' (MESH:C536657) has label='Drug' (should be 'Disease')
  16. DB06168_MESH_D010505_1: 'Familial Mediterranean fever' (MESH:D010505) has label='Drug' (should be 'Disease')
  17. DB06701_MESH_D001289_1: 'Attention deficit hyperactivity disorder' (MESH:D001289) has label='ChemicalSubstance' (should be 'Disease')
  18. DB01086_MESH_D010612_1: 'Sore throat symptom' (MESH:D010612) has label='Drug' (should be 'Disease')
  19. DB01086_MESH_D003371_1: 'Cough' (MESH:D003371) has label='Drug' (should be 'Disease')
  20. DB01086_MESH_D011537_1: 'Itching of skin' (MESH:D011537) has label='Drug' (should be 'Disease')
  21. DB01086_MESH_D014008_1: 'Tinea pedis' (MESH:D014008) has label='Drug' (should be 'Disease')
  22. DB00645_MESH_D011537_1: 'Itching of skin' (MESH:D011537) has label='Drug' (should be 'Disease')

How to Identify

These mismatches can be detected by comparing the node labels to the drug_mesh and disease_mesh IDs in the graph metadata:

for indication in indications:
    drug_mesh = indication['graph']['drug_mesh']
    disease_mesh = indication['graph']['disease_mesh']

    for node in indication['nodes']:
        if node['id'] == drug_mesh and node['label'] != 'Drug':
            print(f"Drug mislabeled in {indication['graph']['_id']}")
        if node['id'] == disease_mesh and node['label'] != 'Disease':
            print(f"Disease mislabeled in {indication['graph']['_id']}")

Suggested Fix

Update the label field for these nodes to match their actual entity type based on the drug_mesh/disease_mesh metadata.

Additional Context

This issue was discovered while implementing a path embedding classifier that extracts drug-disease paths from the mechanism graphs. The current workaround is to use the metadata IDs to identify drug/disease nodes instead of relying on labels, but fixing the labels would improve data consistency.

justaddcoffee avatar Nov 10 '25 17:11 justaddcoffee