feat: build config can specify package CPE
This draft PR is trying out an idea where a package's build config could specify its own CPE values. Today our scanning system maintains its own list of known CPEs for packages with certain names. But perhaps it's better if each package is capable of defining the correct CPE on its own (if it has a CPE), which would remove the need for us to keep a central list up to date.
Improving CPE identification for packages will help us find CVEs for packages better (specifically, by improving recall) because it's not always possible to derive the correct CPE just from the package's other metadata.
Example of how this might look for cosign, for example:
package:
name: cosign
version: 2.4.1
epoch: 5
description: Container Signing
cpe:
vendor: sigstore
product: cosign
copyright:
- license: Apache-2.0
dependencies:
runtime:
- ca-certificates-bundle
# ...
We could add other CPE fields over time as we find the need.
Should we use this CPE information in the package's SBOM as well? Probably. Although there's an argument to be made that finding this data in the built package's .melange.yaml is easier (and perhaps less likely to change shapes in the future) than finding it in the /var/lib/db/sbom/... location.
Added to .PKGINFO:
$ tar -xOf ./packages/aarch64/cosign-2.4.1-r5.apk .PKGINFO
# Generated by melange
pkgname = cosign
pkgver = 2.4.1-r5
arch = aarch64
size = 84551068
origin = cosign
pkgdesc = Container Signing
url =
commit = 54fac0d402f48b3e3ae8ed08fa0be0bea9d5923a
builddate = 1737155808
license = Apache-2.0
depend = ca-certificates-bundle
provides = cmd:cosign=2.4.1-r5
# cpe = cpe:2.3:*:sigstore:cosign:2.4.1:*:*:*:*:*:*:*
datahash = 5fe105545021b762f52f908592194cbf9dea87f4e724010282738b24f6100630
Some "devil's advocate" thoughts are occurring to me about this whole approach:
- Is it really better to divide up the mechanism for scanning accuracy across a bunch of files across multiple repos, rather than one central spot (e.g. in wolfictl)?
- Doesn't having one central spot allow us to apply broader transformations, using things like
strings.HasPrefixto codify entire patterns of CPEs? - Could it be more error-prone to push this repsonsibility onto individual package files? For example, what if a new version stream of something (e.g. corretto) forgets to add the proper CPE data?
- What if we want to change a CPE across multiple files or even repos? Isn't that harder to do if it's not in one spot like in wolfictl?
Sanity checking welcome! 😅
Some "devil's advocate" thoughts are occurring to me about this whole approach:
Thank you for triggering existential crisis on a Saturday.
We need scanning with up to date data & also in production.
It is reasonable for us to use "up to date" information from the git checkouts (like we do for the update: stanza, test:, and so on).
But it not reasonable to ask scanners to clone wolfi-dev/os, at the right point in time to gather the cpe.
Thus for Chainguard security team needs, I do expect them to use up to date data from the latest state of yaml.
But the scanners likely need something in the images to scan them.
So I wonder if we need to somehow ensure we unpack these somewhere (like in /lib/apk/db or the installed database etc.
Or we need to at a later point start using these somewhere else (i.e. elf note, or something).
I have a slight preference to keep packaging code & packaging metadata colocated.
- Is it really better to divide up the mechanism for scanning accuracy across a bunch of files across multiple repos, rather than one central spot (e.g. in wolfictl)?
- Doesn't having one central spot allow us to apply broader transformations, using things like
strings.HasPrefixto codify entire patterns of CPEs?- Could it be more error-prone to push this repsonsibility onto individual package files? For example, what if a new version stream of something (e.g. corretto) forgets to add the proper CPE data?
- What if we want to change a CPE across multiple files or even repos? Isn't that harder to do if it's not in one spot like in wolfictl?
Sanity checking welcome! 😅
Removed the CPE data from .PKGINFO, since we can just get what we need directly from the now embedded .melange.yaml.
I think I should enforce validity of the CPE field values, so that it's impossible for Melange to build with bad CPE data. And then this is probably ready for merge.
Preview of what this enables in wolfictl SBOM and vuln scanning.......... 🔮
$ wolfictl sbom ../wolfi-os/packages/aarch64/cosign-2.4.3-r1.apk -o syft-json | jq '.artifacts | map(select(.type == "apk"))'
[
{
"id": "faf1d6bb7e0fc4cc",
"name": "cosign",
"version": "2.4.3-r1",
"type": "apk",
"foundBy": "wolfictl",
// ...
"cpes": [
{
"cpe": "cpe:2.3:*:sigstore:cosign:2.4.3-r1:*:*:*:*:*:*:*",
"source": "melange-configuration"
}
],
// ...