RTX
RTX copied to clipboard
Start following 'best practice' edge merging
we should update Expand to adhere to 'best practice' edge merging per the Translator Architecture guidelines:

right now ARAX does no edge merging; that is, every edge obtained from every KP is left separate. but, because we do not consider the 'primary knowledge source' in edge keys, we could be losing edges returned from KPs (specifically, if a KP returned multiple edges with the same subject/predicate/object/qualifiers, but different primary knowledge sources, currently we would only retain one of them).
I think the things we'll need to update are:
- [ ] update how we create ARAX edge keys (here); no longer include the KP name, and instead replace it with all 'primary'/'original' knowledge sources; if there's more than one (happens with KG2), we should order them and join them into one string
- to find primary/original knowledge sources, inspect
edge.attributes
to look for attributes with anattribute_type_id
ofbiolink:primary_knowledge_source
orbiolink:original_knowledge_source
- note:
biolink:original_knowledge_source
has apparently been deprecated in Biolink 3.0 in favor ofbiolink:primary_knowledge_source
, but we should still check forbiolink:original_knowledge_source
as well, in case KPs haven't updated that yet
- to find primary/original knowledge sources, inspect
- [ ] update the add_edge() method, so that it merges edge attributes; if two attributes match on all properties, only retain one of them
- [ ] when creating edge attributes for KG2 upstream knowledge sources, give them an
attribute_type_id
ofbiolink:primary_knowledge_source
, instead ofbiolink:knowledge_source
(here) - [ ] we'll have to figure out what to do when an edge has no primary or original knowledge source listed... I'm not sure that's common? but we'll probably want to handle it. maybe we should default to using the KP name in that case?
we can create a new branch for this issue (branched off of the current master
).
note: because you will be making changes to the KG2Querier
code, to test this issue you'll need to locally set force_local = True
in ARAX_expander.py
, to make your machine act as the KG2 API