Node label mismatches in indication_paths.yaml: 28 indications have incorrect Drug/Disease labels
Issue Description
Found 28 indications in indication_paths.yaml where the node labels don't match the actual entity types. This causes issues when trying to programmatically identify drug and disease nodes in the mechanism graphs.
Examples
Drug nodes mislabeled (5 cases)
- DB08896_MESH_D006528_1: 'regorafenib' (MESH:C559147) has
label='Disease'(should be'Drug') - DB08903_MESH_D014397_1: 'bedaquiline' (MESH:C493870) has
label='Disease'(should be'Drug') - DB08903_MESH_D018088_1: 'bedaquiline' (MESH:C493870) has
label='Disease'(should be'Drug') - DB11608_MESH_D002836_1: 'eftrenonacog alfa' (MESH:C000599709) has
label='Disease'(should be'Drug') - DB12184_MESH_D003865_2: 'gepirone' (MESH:C039979) has
label='Disease'(should be'Drug')
Disease nodes mislabeled (22 cases)
- DB01577_MESH_D001289_1: 'Attention deficit hyperactivity disorder' (MESH:D001289) has
label='ChemicalSubstance'(should be'Disease') - DB04931_MESH_D046351_2: 'Erythropoietic protoporphyria' (MESH:D046351) has
label='Drug'(should be'Disease') - DB09270_MESH_D006333_1: 'Congestive heart failure' (MESH:D006333) has
label='Drug'(should be'Disease') - DB00971_MESH_D014010_1: 'Pityriasis versicolor' (MESH:D014010) has
label='Drug'(should be'Disease') - DB00971_MESH_D012628_1: 'Seborrheic dermatitis' (MESH:D012628) has
label='Drug'(should be'Disease') - DB00527_MESH_D011537_1: 'Itching of skin' (MESH:D011537) has
label='Drug'(should be'Disease') - DB06273_MESH_D005871_1: 'Angiofollicular lymph node hyperplasia' (MESH:D005871) has
label='Drug'(should be'Disease') - DB06273_MESH_D013700_1: 'Giant cell arteritis' (MESH:D013700) has
label='Drug'(should be'Disease') - DB06273_MESH_D001172_1: 'Rheumatoid arthritis' (MESH:D001172) has
label='Drug'(should be'Disease') - DB06273_MESH_D001171_1: 'Juvenile rheumatoid arthritis' (MESH:D001171) has
label='Drug'(should be'Disease') - DB09299_MESH_D003866_1: 'Depressive disorder' (MESH:D003866) has
label='Drug'(should be'Disease') - DB01281_MESH_D001171_1: 'Juvenile idiopathic arthritis' (MESH:D001171) has
label='Drug'(should be'Disease') - DB01281_MESH_D001172_1: 'Rheumatoid arthritis' (MESH:D001172) has
label='Drug'(should be'Disease') - DB06168_MESH_D056587_1: 'Cryopyrin associated periodic syndrome' (MESH:D056587) has
label='Drug'(should be'Disease') - DB06168_MESH_C536657_1: 'TNF receptor-associated periodic fever syndrome (TRAPS)' (MESH:C536657) has
label='Drug'(should be'Disease') - DB06168_MESH_D010505_1: 'Familial Mediterranean fever' (MESH:D010505) has
label='Drug'(should be'Disease') - DB06701_MESH_D001289_1: 'Attention deficit hyperactivity disorder' (MESH:D001289) has
label='ChemicalSubstance'(should be'Disease') - DB01086_MESH_D010612_1: 'Sore throat symptom' (MESH:D010612) has
label='Drug'(should be'Disease') - DB01086_MESH_D003371_1: 'Cough' (MESH:D003371) has
label='Drug'(should be'Disease') - DB01086_MESH_D011537_1: 'Itching of skin' (MESH:D011537) has
label='Drug'(should be'Disease') - DB01086_MESH_D014008_1: 'Tinea pedis' (MESH:D014008) has
label='Drug'(should be'Disease') - DB00645_MESH_D011537_1: 'Itching of skin' (MESH:D011537) has
label='Drug'(should be'Disease')
How to Identify
These mismatches can be detected by comparing the node labels to the drug_mesh and disease_mesh IDs in the graph metadata:
for indication in indications:
drug_mesh = indication['graph']['drug_mesh']
disease_mesh = indication['graph']['disease_mesh']
for node in indication['nodes']:
if node['id'] == drug_mesh and node['label'] != 'Drug':
print(f"Drug mislabeled in {indication['graph']['_id']}")
if node['id'] == disease_mesh and node['label'] != 'Disease':
print(f"Disease mislabeled in {indication['graph']['_id']}")
Suggested Fix
Update the label field for these nodes to match their actual entity type based on the drug_mesh/disease_mesh metadata.
Additional Context
This issue was discovered while implementing a path embedding classifier that extracts drug-disease paths from the mechanism graphs. The current workaround is to use the metadata IDs to identify drug/disease nodes instead of relying on labels, but fixing the labels would improve data consistency.