trivy icon indicating copy to clipboard operation
trivy copied to clipboard

fix(docker): fix non-det scan results for images with embedded SBOM

Open AndreOganesian opened this issue 1 month ago • 9 comments

Description

Update docker deduplication logic to prioritize packages with PURL matching OS to ensure deterministic results. Details found in linked issue

Related issues

  • Close #9865

Related PRs

  • [ ] #9142
  • [ ] #8298

Checklist

  • [x] I've read the guidelines for contributing to this repository.
  • [x] I've followed the conventions in the PR title.
  • [x] I've added tests that prove my fix is effective or that my feature works.
  • [ ] I've updated the documentation with the relevant information (if needed).
  • [ ] I've added usage information (if the PR introduces new options)
  • [ ] I've included a "before" and "after" example to the description (if the PR is a user interface change).

AndreOganesian avatar Dec 03 '25 06:12 AndreOganesian

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Dec 03 '25 06:12 CLAassistant

Hi @AndreOganesian, Thank you for your work.

I'm not sure deduplication is what we want in these cases.
On the contrary — we probably need to separate these packages.

Let’s look at the two apk-tools packages:

  1. The first one is obtained from lib/apk/db/installed; we assign it the Chainguard OS and generate the corresponding PURL.
  2. The second one comes from the SBOM, and its PURL contains a different OS.

Strictly speaking, based on OS origin, we cannot say these two packages are the same.
Therefore, we should not merge/deduplicate them — they must remain separate.

However, there is an important detail:
Even though this distinction is valid for SBOM processing, it causes problems for vulnerability scanning — we will pick the wrong OS and therefore the wrong advisory database.

I don’t think we should always remove such packages, but I also don’t see a simple, clean solution for filtering this inside the vulnerability scanner.

@knqyf263 — what do you think?

In general, when parsing SBOMs, we already keep packages only for a single OS.
So removing packages that belong to a “non-matching” OS might also be a valid option. On the other hand, for RPM packages extracted from archives, we use found OS.

DmitriyLewen avatar Dec 03 '25 08:12 DmitriyLewen

In the first place, why is an incorrect SBOM included in the container image? Shouldn’t we ask Chainguard to fix it?

knqyf263 avatar Dec 03 '25 14:12 knqyf263

Thanks for the comments. From my understanding, this isn't really an "incorrect" SBOM since it's valid from a build perspective, as Chainguard images (OS) are built from Wolfi packages, so the embedded SBOM reflects the origin (wolfi) while the container itself is identified as Chainguard. The issue arises in the cases where the chainguard and wolfi diffs differ, which is the actual issue.

Also to clarify Dmitriy's original point, the current state of Trivy is already doing deduplication and only using one of the chainguard or wolfi if both appear. This change simply forces it to match with the OS, in the event of a split, which doesn't really change the core logic, but just makes it consistent.

Not sure what the cleanest solution is for the long term state

AndreOganesian avatar Dec 03 '25 18:12 AndreOganesian

Also to clarify Dmitriy's original point, the current state of Trivy is already doing deduplication and only using one of the chainguard or wolfi if both appear. This change simply forces it to match with the OS, in the event of a split, which doesn't really change the core logic, but just makes it consistent.

Previously, we assumed that all packages belonged to a single OS. We hadn’t seen cases like this before, so the existing deduplication logic worked correctly.

This is why I mentioned that deduplication is incorrect in this situation, and the logic needs to be revised.

as Chainguard images (OS) are built from Wolfi packages

Do you have origin image with this case?

DmitriyLewen avatar Dec 04 '25 07:12 DmitriyLewen

Thanks for the comments. From my understanding, this isn't really an "incorrect" SBOM since it's valid from a build perspective, as Chainguard images (OS) are built from Wolfi packages, so the embedded SBOM reflects the origin (wolfi) while the container itself is identified as Chainguard. The issue arises in the cases where the chainguard and wolfi diffs differ, which is the actual issue.

It is common for the origin and supplier of a package to differ. For example, the creator of the glibc package is the Free Software Foundation, while the supplier may be Red Hat or others. In such cases, the SPDX “Package originator” field should be used. https://spdx.github.io/spdx-spec/v2.3/package-information/#76-package-originator-field

If the PURL points to the Free Software Foundation in this context, I believe that would still be incorrect. It should be Red Hat. Regardless of where Chainguard obtained the package, if it is currently being distributed as a Chainguard package, shouldn’t the PURL reflect that?

On the other hand, if Chainguard is not redistributing the packages and is only using the Wolfi packages as they are, then they should not be associated with Chainguard OS, and they should indeed be detected separately, as Dmitriy said. I think it’s a similar kind of relationship to how a binary installed from GitHub Releases via curl has no connection to the operating system.

knqyf263 avatar Dec 04 '25 08:12 knqyf263

Thanks both, understood. I unfortunately cannot share the origin image, but tried to represent it as best as possible with my example.

So, confirming that this is seen as a Chainguard issue rather than something to work around in Trivy?

If so, I can instead send out a change to do a consistent sort, to at least ensure the logic is deterministic?

AndreOganesian avatar Dec 04 '25 13:12 AndreOganesian

Ideally, Chainguard should fix this on their side, IMO. However, since there is a possibility that SBOMs with mismatched OS information could be included in images, I think it would be acceptable to add a workaround on the Trivy side.

Rather than merging/deduplicating, how about simply filtering out packages with mismatched PURL namespaces?

diff --git a/pkg/fanal/applier/docker.go b/pkg/fanal/applier/docker.go
index a5c8f44c3..08925881f 100644
--- a/pkg/fanal/applier/docker.go
+++ b/pkg/fanal/applier/docker.go
@@ -235,6 +235,9 @@ func ApplyLayers(layers []ftypes.BlobInfo) ftypes.ArtifactDetail {
 		}
 	}
 
+	// Filter OS packages with mismatched PURL namespace
+	mergedLayer.Packages = filterMismatchedOSPkgs(mergedLayer.OS.Family, mergedLayer.Packages)
+
 	// De-duplicate same debian packages from different dirs
 	// cf. https://github.com/aquasecurity/trivy/issues/8297
 	mergedLayer.Packages = xslices.ZeroToNil(lo.UniqBy(mergedLayer.Packages, func(pkg ftypes.Package) string {
@@ -343,3 +346,41 @@ func secretFindingsContains(findings []ftypes.SecretFinding, finding ftypes.Secr
 	}
 	return false
 }
+
+// purlMatchesOS checks if a package's PURL namespace matches the detected OS family.
+// Returns true if the package should be kept (matches OS or has no PURL/namespace).
+// Returns false if the package should be filtered out (has PURL with mismatched namespace).
+func purlMatchesOS(pkg ftypes.Package, osFamily ftypes.OSType) bool {
+	if pkg.Identifier.PURL == nil || osFamily == "" {
+		return true // Keep packages without PURL or when OS is not detected
+	}
+	if pkg.Identifier.PURL.Namespace == "" {
+		return true // Keep packages without namespace
+	}
+	return pkg.Identifier.PURL.Namespace == string(osFamily)
+}
+
+// filterMismatchedOSPkgs removes OS packages whose PURL namespace doesn't match the detected OS.
+// Packages with pre-existing PURLs are typically from SBOM files embedded in the image.
+func filterMismatchedOSPkgs(osFamily ftypes.OSType, pkgs ftypes.Packages) ftypes.Packages {
+	if osFamily == "" {
+		return pkgs // No OS detected, keep all packages
+	}
+
+	var filtered int
+	result := lo.Filter(pkgs, func(pkg ftypes.Package, _ int) bool {
+		if purlMatchesOS(pkg, osFamily) {
+			return true
+		}
+		filtered++
+		return false
+	})
+
+	if filtered > 0 {
+		log.WithPrefix("applier").Warn("Some OS packages were skipped due to mismatched PURL namespace",
+			log.Int("pkg_count", filtered),
+			log.String("detected_os", string(osFamily)))
+	}
+
+	return result
+}

knqyf263 avatar Dec 09 '25 10:12 knqyf263

Updated PR with your suggestions, now filter packages with mismatching PURL namespace instead.

Thanks for helping with this! Let me know if any other comments

AndreOganesian avatar Dec 10 '25 06:12 AndreOganesian