RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Subject node attributes

Open acevedol opened this issue 3 years ago • 2 comments

KG2 needs to start including attributes per the biolink model.

After discussing with Steve, we agreed to start by focusing on a single use case. We found a triple CHEBI:94714, RO:0000087, CHEBI:49167 that translates to doxofylline, related to, anti-asthmatic drug. In ChEBI, doxofylline has role anti-asthmatic drug. I reviewed this with Sierra Moxon, and this resulted in has_chemical_role being added as a node property to the biolink model. It is a multi-valued property because a drug can have multiple roles.

The triple above needs to be processed by predicate-remap.yaml to discard the triple and add the role to a new has_chemical_role property on the subject node. kg2_util.py has an example, has_biological_sequence, of how to add a new property to the node.

Upon review, we also noticed an error in kg_json_to_tsv.py where not all the node properties are being loaded into the tsv. The new node property will also need to be included here.

I also need to make updated to _load_kg2pre_tsv for the KG2c build and any other places kg2pre node properties are defined in KG2 and KG2c.

acevedol avatar Apr 13 '22 23:04 acevedol

Work for this ticket is in branch issue-199

acevedol avatar Apr 13 '22 23:04 acevedol

I'm not sure about how much needs to be changed in kg_json_to_tsv.py. I added has_chemical_role as a list attribute to the header row, but it does not appear that the node properties are manually defined. I am generating a test-kg2 to check this.

acevedol avatar Apr 21 '22 20:04 acevedol