Erica Wood

Results 183 comments of Erica Wood

I am not sure what to do about this: ``` {"('U000006', 'CCPSS')": {"cuis": ["C0740670"], "names": ["Y|UNKNOWN/MISC PROBLEM"]}} {"('U000006', 'COSTAR')": {"cuis": ["C0004766", "C0149612"], "names": ["Y|ABNORMAL STRESS TEST", "Y|BARTHOLIN'S GLAND ABSCESS"]}} {"('U000006',...

Here are all of the sets of TUIs that make up the `DRUGBANK` UMLS nodes: ``` [ "['T121', 'T125', 'T127']", "['T109', 'T120', 'T121', 'T130']", "['T121', 'T131', 'T197']", "['T114', 'T116', 'T121']",...

``` ubuntu@ip-172-31-50-116:~/kg2-build$ grep "has category inconsistency" umls_node_ids.log | wc -l 823502 ubuntu@ip-172-31-50-116:~/kg2-build$ grep "has name inconsistency" umls_node_ids.log | wc -l 172104 ``` Given the current code, this is where inconsistencies...

Here's all of the problem node category pairings: ``` { "biolink:Activity---biolink:Agent": 488, "biolink:Activity---biolink:AnatomicalEntity": 1, "biolink:Activity---biolink:BiologicalEntity": 3, "biolink:Activity---biolink:BiologicalProcess": 2, "biolink:Activity---biolink:ClinicalIntervention": 179, "biolink:Activity---biolink:Device": 1, "biolink:Activity---biolink:InformationContentEntity": 26, "biolink:Activity---biolink:NamedThing": 168, "biolink:Activity---biolink:Phenomenon": 14, "biolink:Activity---biolink:PhenotypicFeature": 2,...

To do on this issue: - Verify TUI mappings - Verify each source of node looks right - Source nodes - Update dates - Verify edges to some degree -...

In order to triage, here is the discrepancy map sorted by value: ``` biolink:Drug---biolink:ChemicalEntity: 187984 biolink:Polypeptide---biolink:NamedThing: 111749 biolink:NamedThing---biolink:BiologicalEntity: 85702 biolink:GrossAnatomicalStructure---biolink:AnatomicalEntity: 75418 biolink:Protein---biolink:Polypeptide: 60812 biolink:NamedThing---biolink:Gene: 54922 biolink:DiseaseOrPhenotypicFeature---biolink:NamedThing: 35529 biolink:Drug---biolink:NamedThing: 34956 biolink:NamedThing---biolink:InformationContentEntity:...

Here's the sources that are causing the issues for the top 10 rows: ``` { "biolink:DiseaseOrPhenotypicFeature---biolink:NamedThing": { "OMIM": 35061, "UMLS": 451, "MESH": 17 }, "biolink:DiseaseOrPhenotypicFeature---biolink:PhenotypicFeature": { "NCIT": 17843, "HP": 6731...

Examples of "discrepancies" that actually seem like improvements: (first category in list is new category, second one is old category) ``` https://identifiers.org/umls:C5706686 UMLS:C5706686 with name: Pertuzumab Zuvotolimod has category inconsistency:...

For the top ten node category discrepancy types, here are 20 nodes from category pair that were equally spaced out in the sample. Based on this, I think we need...

Here are the sources with name inconsistencies: ``` { "UMLS": 103711, "HGNC": 19853, "MESH": 13492, "OMIM": 6812, "DRUGBANK": 1338, "GO": 1199, "NDDF": 932, "NCIT": 372, "PDQ": 367, "ATC": 327, "ICD9":...