guac icon indicating copy to clipboard operation
guac copied to clipboard

[feature] Implement ingestion for layerID metadata

Open stevemenezes opened this issue 2 years ago • 12 comments

Is your feature request related to a problem? Please describe. At present, the layerID information is not being ingested by GUAC for both the SPDX and CDX formats. We would want this metadata ingestion to be enabled in GUAC.

Describe the solution you'd like GUAC should help parse and ingest the layerID information for the SPDX and CDX formats as starters.

Describe alternatives you've considered N/A

Additional context layerID is present in the comment section of the files enumeration and syft:location:0:layerID property of a component for SPDX and CDX files respectively.

stevemenezes avatar Jun 22 '23 22:06 stevemenezes

Hmm, would layer IDs be okay to be represented as packages? and a DEPENDS_ON relationship matching them?

So like a layer ID would look a bit like "pkg:container_layer/sha256:abdef...", and then have an IsOccurrence to that matching Artifact. So this would resemble how we handle files.

Thoughts? @pxp928 @mihaimaruseac @mlieberman85

lumjjb avatar Jun 30 '23 17:06 lumjjb

I think it should be ok and a relationship INCLUDED_IN that links one to the next layer(s) that incorporate it. We can use this for ML models that start from other pretrained models too (though a different pkg:model/ pURL prefix)

mihaimaruseac avatar Jun 30 '23 17:06 mihaimaruseac

Hmm...how would we do the new relationship INCLUDED_IN? @mihaimaruseac. Yeah, the approach makes sense for the layerID. Probably worth considering adding a specific "type" for layerIDs so that it's easier to filter on.

pxp928 avatar Jun 30 '23 17:06 pxp928

Parsing the docker file or similar we can extract each layer from the base one all the way to the final container. Each layer has a digest which can identify the layer node in GUAC and from that one we can build the relationship to the next one. Will still be a verb, so we'd encode which docker container gave us the link

mihaimaruseac avatar Jun 30 '23 18:06 mihaimaruseac

so a new verb is needed for INCLUDED_IN relationship. Not being done with existing verbs. Correct?

pxp928 avatar Jun 30 '23 18:06 pxp928

Yes.

mihaimaruseac avatar Jun 30 '23 18:06 mihaimaruseac

hmm what do you mean by the INCLUDED_IN relationship. I'm not too sure i follow. Since layers are just tarballs, i'm assuming that it would be for the container package that includes the layers and not between layers?

btw, SPDX 3.0 has gone towards the direction of including qualifiers on DEPENDS_ON relationship, not sure if we'd want to consider that as well.

lumjjb avatar Jul 07 '23 17:07 lumjjb

I think it's more semantics. We already DEPENDS_ON (IsDependency) to mean that a package depends on another. I was thinking of a new predicate instead of reusing the existing one to preempt having to discern between "does this edge mean that A is included/vendored in B?" and "does this edge mean that you need A in order to use B?" (at runtime/buildtime, but we don't differentiate these dependencies right now).

mihaimaruseac avatar Jul 07 '23 17:07 mihaimaruseac

Based on discussion, we need to determine how we can represent the various types of IsDependency. One method is adding qualifiers to the IsDependency node such that it gives us greater detail about the type of dependency.

pxp928 avatar Jul 12 '23 21:07 pxp928

Here's a proposal on how to encode layerID and adjacent container image metadata

https://docs.google.com/document/d/11WqkncYYob8MtNkcvTZiYcjbvclT15UKFh6coDjJToU/edit

lumjjb avatar Aug 21 '23 21:08 lumjjb

I'm interested on working on this one

ridhoq avatar Jun 03 '24 22:06 ridhoq

After some discussions with @pxp928, @lumjjb, and @fengalex43, there's been a few updates to the proposal that was shared originally by Brandon.

  1. Instead of creating a new model called HasMetadataLink, the existing HasMetadata will be used for describing base image relationships. HasMetdata will have a new optional subject field that will be used to connect the base image OCI package to the container image OCI package.
  2. Instead of using HasMetadataLink to describe the relationship between a file and a layer, the existing IsDependency will be used to connect a file to a layer. A new field will be added to denote the "type" of Dependency it is - not to be confused with the existing dependencyType field. We still need to finalize on the new field name.

The high level idea behind this change is that HasMetadata should be linking models that are in different SBOMs whereas IsDependency should be linking models found within a single SBOM

ridhoq avatar Aug 26 '24 17:08 ridhoq