guac
guac copied to clipboard
[Discussion] Principles of IsDepedency data model and beyond
In response to some issues brought up around IsDepedency #594 and #965, the following design doc is written to help address those issues.
https://docs.google.com/document/d/1A2Fz0TLTsYAsJnTlztCdH05XnMM9llcpVqZtQa5Wt6M/edit
During the use and development of GUAC, one predicate that has been tricky has been IsDepedency. This is due to multiple factors, including the data quality contributing to the predicates, the slightly differing intent and meaning of the predicates (declared build dependencies vs statically linked vs dynamically linked dependencies), and general variance in behavior over ecosystems.
This document talks about how to reason about such issues using IsDependency as an example, and then propose some potential solutions to the IsDepedency problem.
@knrc
@lumjjb Thanks very much, I'll take a look at this later today
@lumjjb It looks as if this could address the issue, do you want me to give it a try? My concern would be the increased verbosity when ingesting.
@knrc With the separation of "noun" and "verb" ingestion and batch ingestion of "nouns" and "verbs", we should be ok in terms of ingesting.
@pxp928 I haven't looked at the code in a couple of weeks, but from what I remember there is a lot of duplication in what is sent to the server. I'll update and see what I've missed
Duplication is still an issue to resolve but in terms of ingestion, we have been trying to make improvements as we work with a graph database backend
@pxp928 okay, in that case it will still be an issue and this would add much more data to what is ingested
How so? You would just be adding links to the existing isDependencies, isOccurrences and packages or artifacts
@pxp928 true, but that is resolved on the backend so the information still has to be transmitted.
We went ahead with the solution from "Efficient retrieval with HasSBOM evidence tree edges", implemented in https://github.com/guacsec/guac/pull/1367.
This does open up a discussion further of how IsDependency should be used. In the ontology, IsDependency should no longer be used as a way to traverse edges due to the highlighted #594 , and under/over fitted software identifiers. Therefore, IsDependency is only used in context of a collection (in this case HasSBOM). Therefore, the next steps of IsDependency should be to remove it as a top level operation - and it should only be referenced through HasSBOM.