guac icon indicating copy to clipboard operation
guac copied to clipboard

Discuss how to handle identifiers and duplicate scenarios

Open lumjjb opened this issue 3 years ago • 1 comments

Some entities may have multiple identifiers. Let's figure out what's the best way to handle them, especially for merging nodes and insertion of new edges/relating new information. Another tricky question also revolves around empty identifier fields and possible lists of identifiers vs having multiple nodes.

https://github.com/guacsec/guac/pull/107#discussion_r979458261

FYI: @pxp928 @mlieberman85 @mihaimaruseac

lumjjb avatar Sep 27 '22 20:09 lumjjb

This is I think the entity resolution problem, which can be a very deep hole. My advice would be that you need to be able to record that your knowledge that A and B are the same entity began from some point in time. That is: before the entity is resolved, you treat them as separate. After resolving them you merge them. But after merging you are still able to recreate historical queries as if they hadn't merged, when this is necessary.

Sounds like a hassle, and it is, but the alternative is being unable to audit the history of the graph by comparing queries made today to the results of queries made in the past.

jchestershopify avatar Sep 28 '22 14:09 jchestershopify

Completed this part as addressed by the new Graphql model from https://github.com/guacsec/guac/issues/217

lumjjb avatar May 18 '23 15:05 lumjjb