syft
syft copied to clipboard
Online verification of artifacts
This issue is meant to be a spot to host discussion on a couple of related topics:
-
should syft gather information from external sources (e.g. maven.org, pypi.org, rubygems.org, etc.) in order in enrich the information provided in the SBOM with select point-in-time external package data? For example, for all jars found, use the SHA1 of the Jar to search for an authoritative ArtifactID and GroupID for the Jar (since sometimes the packaged data is inaccurate or missing).
-
should syft verify information found within a scanned artifacts against external sources? And if so, list verification claims directly in the SBOM?
This issue has intentionally been left open-ended to gather feedback and specific use cases from the community.
These are great questions @wagoodman
My knee jerk reaction to this was "why wouldn't it!"
But upon further thought, I'm less certain.
Today Syft is functionally a tool for taking a snapshot of some collection of artifacts (I know it does some other things like convert between SBOM formats, but that's a different discussion, let's just pretend it only collects details right now).
I think you describe in both of these cases is a second pass operation. Pass one is to collection details, pass two is to enrich the data. If we make Syft do both of these, we will be adding A LOT of new functionality to Syft to increase complexity.
Maybe the real question is should Syft do one thing, or should Syft do many things?
There would be massive value to use this information to enrich an SBOM, I suspect we all agree we want a tool to do this. Should Syft do it or should a new tool do it?
I had always envisioned some other tool after syft doing the enrichment. Then you have a trail from the original sbom and an option to use the enrichment or not without complicating syft with more options to do everything
While it looks like a decision is already made via #1158 I wanted to link to https://github.com/anchore/syft/issues/1129 as a sample for this discussion. As some ecosystems don't commit full dependency tree info (e.g. spring boot pom.xml) into the git repo, an un-enriched scan isn't able to produce a complete and accurate sbom from the source repo.
While there's a few directions enrichment can go in, they all have tradeoffs:
- wait for the build process (e.g. the jar manifests are more complete), but this is a slower feedback loop
- resolve external sources similar to how the build would, adding a larger footprint to Syft and requiring user configuration
- require the project directory to be "initialized" or have the build toolchain present, adding overhead to CI integration