RTX icon indicating copy to clipboard operation
RTX copied to clipboard

Start following 'best practice' edge merging

Open amykglen opened this issue 2 years ago • 1 comments

we should update Expand to adhere to 'best practice' edge merging per the Translator Architecture guidelines:

Screen Shot 2023-01-18 at 1 16 41 PM

right now ARAX does no edge merging; that is, every edge obtained from every KP is left separate. but, because we do not consider the 'primary knowledge source' in edge keys, we could be losing edges returned from KPs (specifically, if a KP returned multiple edges with the same subject/predicate/object/qualifiers, but different primary knowledge sources, currently we would only retain one of them).

I think the things we'll need to update are:

  • [ ] update how we create ARAX edge keys (here); no longer include the KP name, and instead replace it with all 'primary'/'original' knowledge sources; if there's more than one (happens with KG2), we should order them and join them into one string
    • to find primary/original knowledge sources, inspect edge.attributes to look for attributes with an attribute_type_id of biolink:primary_knowledge_source or biolink:original_knowledge_source
    • note: biolink:original_knowledge_source has apparently been deprecated in Biolink 3.0 in favor of biolink:primary_knowledge_source, but we should still check for biolink:original_knowledge_source as well, in case KPs haven't updated that yet
  • [ ] update the add_edge() method, so that it merges edge attributes; if two attributes match on all properties, only retain one of them
  • [ ] when creating edge attributes for KG2 upstream knowledge sources, give them an attribute_type_id of biolink:primary_knowledge_source, instead of biolink:knowledge_source (here)
  • [ ] we'll have to figure out what to do when an edge has no primary or original knowledge source listed... I'm not sure that's common? but we'll probably want to handle it. maybe we should default to using the KP name in that case?

we can create a new branch for this issue (branched off of the current master).

note: because you will be making changes to the KG2Querier code, to test this issue you'll need to locally set force_local = True in ARAX_expander.py, to make your machine act as the KG2 API

amykglen avatar Jan 18 '23 21:01 amykglen