guac icon indicating copy to clipboard operation
guac copied to clipboard

[Discussion] Principles of IsDepedency data model and beyond

Open lumjjb opened this issue 2 years ago • 10 comments

In response to some issues brought up around IsDepedency #594 and #965, the following design doc is written to help address those issues.

https://docs.google.com/document/d/1A2Fz0TLTsYAsJnTlztCdH05XnMM9llcpVqZtQa5Wt6M/edit

During the use and development of GUAC, one predicate that has been tricky has been IsDepedency. This is due to multiple factors, including the data quality contributing to the predicates, the slightly differing intent and meaning of the predicates (declared build dependencies vs statically linked vs dynamically linked dependencies), and general variance in behavior over ecosystems.

This document talks about how to reason about such issues using IsDependency as an example, and then propose some potential solutions to the IsDepedency problem.

lumjjb avatar Jun 22 '23 00:06 lumjjb

@knrc

lumjjb avatar Jun 26 '23 14:06 lumjjb

@lumjjb Thanks very much, I'll take a look at this later today

knrc avatar Jul 06 '23 16:07 knrc

@lumjjb It looks as if this could address the issue, do you want me to give it a try? My concern would be the increased verbosity when ingesting.

knrc avatar Jul 07 '23 14:07 knrc

@knrc With the separation of "noun" and "verb" ingestion and batch ingestion of "nouns" and "verbs", we should be ok in terms of ingesting.

pxp928 avatar Jul 07 '23 14:07 pxp928

@pxp928 I haven't looked at the code in a couple of weeks, but from what I remember there is a lot of duplication in what is sent to the server. I'll update and see what I've missed

knrc avatar Jul 07 '23 16:07 knrc

Duplication is still an issue to resolve but in terms of ingestion, we have been trying to make improvements as we work with a graph database backend

pxp928 avatar Jul 07 '23 16:07 pxp928

@pxp928 okay, in that case it will still be an issue and this would add much more data to what is ingested

knrc avatar Jul 07 '23 17:07 knrc

How so? You would just be adding links to the existing isDependencies, isOccurrences and packages or artifacts

pxp928 avatar Jul 07 '23 17:07 pxp928

@pxp928 true, but that is resolved on the backend so the information still has to be transmitted.

knrc avatar Jul 07 '23 17:07 knrc

We went ahead with the solution from "Efficient retrieval with HasSBOM evidence tree edges", implemented in https://github.com/guacsec/guac/pull/1367.

This does open up a discussion further of how IsDependency should be used. In the ontology, IsDependency should no longer be used as a way to traverse edges due to the highlighted #594 , and under/over fitted software identifiers. Therefore, IsDependency is only used in context of a collection (in this case HasSBOM). Therefore, the next steps of IsDependency should be to remove it as a top level operation - and it should only be referenced through HasSBOM.

lumjjb avatar Apr 02 '24 23:04 lumjjb