cdxgen
cdxgen copied to clipboard
Generated purl for OCI images includes namespace, which is not allowed by purl spec
When generating an SBOM using cdxgen with -t docker, the resulting BOM includes a purl of type oci that incorrectly uses the repository URL as the namespace, which is explicitly disallowed by the current purl specification.
Steps to Reproduce:
- Run:
cdxgen -t docker -o bom.json registry.relizahub.com/library/rearm-cli@sha256:696a2e4d457df5be966a4570d9695905b3d0afcf69d7728f0746d836504c4fce
- Observe in bom.json:
"purl":"pkg:oci/registry.relizahub.com/library/rearm-cli@sha256:696a2e4d457df5be966a4570d9695905b3d0afcf69d7728f0746d836504c4fce"
Problem:
According to the purl spec for OCI:
OCI purls do not contain a namespace, although, repository_url may contain a namespace as part of the physical location of the package.
This means:
-
The
namespaceshould be omitted from the purl. -
Information like
registry.relizahub.com/libraryorghcr.io/org/should instead go into arepository_urlfield.
Including the namespace violates the spec and may cause issues with tooling that strictly parses purl.
There's an open issue for this on purl-spec as well OCI PURL type should allow namespace declaration #425
Suggested Fix:
-
Remove the
namespacesegment from OCIpurls. -
Move repository or registry details to a
repository_urlqualifier (e.g.,pkg:oci/rearm-cli@sha256:696a2e4d457df5be966a4570d9695905b3d0afcf69d7728f0746d836504c4fce?repository_url=registry.relizahub.com/library).
Version Info:
cdxgen version: 11.2.6
References:
Thank you for this report. We need some time to think this through since it's a breaking change.
@setchy any thoughts on this issue?
Thanks for reporting @logicflakes - looks like we need to update our OCI parsing logic to be compliant with the purl spec. I'm curious how downstream platforms like Dependency Track would display the repository details.
This is another purl weirdness. I am now facing a situation where the repository_url is not the same as the namespace. OCI images can be published to multiple registries from the same repository.
An ideal spec should have no opinion about the namespace or the name attribute.
This is another purl weirdness. I am now facing a situation where the repository_url is not the same as the namespace. OCI images can be published to multiple registries from the same repository.
Could you give an example of this? My understanding was that repository_url essentially means registry URL, so images pushed to multiple registries should have distinct Purls.
Repository could be on github.com or codeberg.org, and registry could be on quay.io or even ghcr.io.
I believe that this does not mean source code repository.
From the current purl spec definition (https://github.com/package-url/purl-spec/blob/main/PURL-TYPES.rst):
repository_url: A repository URL where the artifact may be found, but not intended as the only location. This value is encouraged to identify a location the content may be fetched.
So this is a registry URL, and currently the suggestion is to arbitrarily pick one if the image can be found in different locations.
Personally, I don't find this ideal, but the definition is clear enough for me for implementations.
So in your example, we should pick either quay.io or ghcr.io.
I'm not convinced that the cost of breaking all the downstream tools (including my own) is worth it in this case. OCI images are always referred to using the full name, including the registry, unless there is a default registry setting in the client tools. Even then, it's a security nightmare, and most people recommend referring to images using the full name, including the hash.
I think I'm going to stop at 80% or 90% compliant with purl (and even CycloneDX for that matter).