guac
guac copied to clipboard
[feature] Implement ingestion for layerID metadata
Is your feature request related to a problem? Please describe. At present, the layerID information is not being ingested by GUAC for both the SPDX and CDX formats. We would want this metadata ingestion to be enabled in GUAC.
Describe the solution you'd like GUAC should help parse and ingest the layerID information for the SPDX and CDX formats as starters.
Describe alternatives you've considered N/A
Additional context
layerID is present in the comment section of the files enumeration and syft:location:0:layerID property of a component for SPDX and CDX files respectively.
Hmm, would layer IDs be okay to be represented as packages? and a DEPENDS_ON relationship matching them?
So like a layer ID would look a bit like "pkg:container_layer/sha256:abdef...", and then have an IsOccurrence to that matching Artifact. So this would resemble how we handle files.
Thoughts? @pxp928 @mihaimaruseac @mlieberman85
I think it should be ok and a relationship INCLUDED_IN that links one to the next layer(s) that incorporate it. We can use this for ML models that start from other pretrained models too (though a different pkg:model/ pURL prefix)
Hmm...how would we do the new relationship INCLUDED_IN? @mihaimaruseac. Yeah, the approach makes sense for the layerID. Probably worth considering adding a specific "type" for layerIDs so that it's easier to filter on.
Parsing the docker file or similar we can extract each layer from the base one all the way to the final container. Each layer has a digest which can identify the layer node in GUAC and from that one we can build the relationship to the next one. Will still be a verb, so we'd encode which docker container gave us the link
so a new verb is needed for INCLUDED_IN relationship. Not being done with existing verbs. Correct?
Yes.
hmm what do you mean by the INCLUDED_IN relationship. I'm not too sure i follow. Since layers are just tarballs, i'm assuming that it would be for the container package that includes the layers and not between layers?
btw, SPDX 3.0 has gone towards the direction of including qualifiers on DEPENDS_ON relationship, not sure if we'd want to consider that as well.
I think it's more semantics. We already DEPENDS_ON (IsDependency) to mean that a package depends on another. I was thinking of a new predicate instead of reusing the existing one to preempt having to discern between "does this edge mean that A is included/vendored in B?" and "does this edge mean that you need A in order to use B?" (at runtime/buildtime, but we don't differentiate these dependencies right now).
Based on discussion, we need to determine how we can represent the various types of IsDependency. One method is adding qualifiers to the IsDependency node such that it gives us greater detail about the type of dependency.
Here's a proposal on how to encode layerID and adjacent container image metadata
https://docs.google.com/document/d/11WqkncYYob8MtNkcvTZiYcjbvclT15UKFh6coDjJToU/edit
I'm interested on working on this one
After some discussions with @pxp928, @lumjjb, and @fengalex43, there's been a few updates to the proposal that was shared originally by Brandon.
- Instead of creating a new model called
HasMetadataLink, the existingHasMetadatawill be used for describing base image relationships.HasMetdatawill have a new optionalsubjectfield that will be used to connect the base image OCI package to the container image OCI package. - Instead of using
HasMetadataLinkto describe the relationship between a file and a layer, the existingIsDependencywill be used to connect a file to a layer. A new field will be added to denote the "type" of Dependency it is - not to be confused with the existingdependencyTypefield. We still need to finalize on the new field name.
The high level idea behind this change is that HasMetadata should be linking models that are in different SBOMs whereas IsDependency should be linking models found within a single SBOM