biolink-model icon indicating copy to clipboard operation
biolink-model copied to clipboard

Adjusting Provenance For Edge Merging

Open uhbrar opened this issue 2 years ago • 0 comments

I recently made a proposal on how edges might be merged: https://github.com/NCATSTranslator/TranslatorArchitecture/pull/70

With edge merging, there is some necessary changes that would need to be considered to maintain provenance information. In the proposal, I suggest a graph-like solution to maintain the full tree of paths that an edge might take. This method would make it very clear what services an edge passed through, but is not the only option. Essentially, this schema would allow a service to add provenance that also lists where it received the edge from, in the form of a directed graph. This proposal also suggests moving provenance out of attributes, although it is possible that that it could still use the same attribute object, with some modifications.

The question remains if this is an effective model for provenance, or if there is still information loss. If this is effective, then it must be determined what changes need to be made to provenance to facilitate this, and if any changes need to be made to the attribute object.

Additionally, the distinction between "primary_knowledge_source" and "original_knowledge_source" needs to be clarified, since this proposal assumes that they are different. If they can be equated, it should be decided which one to use when services report one or the other.

@mbrush

uhbrar avatar Apr 28 '22 17:04 uhbrar