guac
guac copied to clipboard
Discuss how to handle identifiers and duplicate scenarios
Some entities may have multiple identifiers. Let's figure out what's the best way to handle them, especially for merging nodes and insertion of new edges/relating new information. Another tricky question also revolves around empty identifier fields and possible lists of identifiers vs having multiple nodes.
https://github.com/guacsec/guac/pull/107#discussion_r979458261
FYI: @pxp928 @mlieberman85 @mihaimaruseac
This is I think the entity resolution problem, which can be a very deep hole. My advice would be that you need to be able to record that your knowledge that A and B are the same entity began from some point in time. That is: before the entity is resolved, you treat them as separate. After resolving them you merge them. But after merging you are still able to recreate historical queries as if they hadn't merged, when this is necessary.
Sounds like a hassle, and it is, but the alternative is being unable to audit the history of the graph by comparing queries made today to the results of queries made in the past.
Completed this part as addressed by the new Graphql model from https://github.com/guacsec/guac/issues/217