RPM Namespace is Ambiguous, Need Clarity
The type document states:
The namespace is the vendor such as Fedora or OpenSUSE. It is not case sensitive and must be lowercased.
It isn't clear if this is the vendor field from the RPM header or a user supplied value. If it is a user supplied value then it has to be ignored by tooling as one tool can pick a different name for a vendor (e.g. fedora versus fedora%20project).
If it is the vendor field from the RPM header then the example should be fedora%20project.
Just need some clarity on whether the namespace for RPMs is supplied by the user / tool or if it should be collected from the RPM header.
Thank you.
To help with this issue, could we also get some examples for redhat packages into the PURL-TYPES.rst file? This would go a huge way to clear up confusion that is emerging in the community as people use Purl, and guide people about how to set the "namespace" and also the "distro" qualifier.
In the absence of many examples in PURL-TYPES.rst, we see different scanners use a wide variety of different options, which makes consuming Purls and using them as effective locators quite difficult. For example, from the doc it's not clear which of the following variants would be valid/recommended:
pkg:rpm/[email protected]_6 pkg:rpm/[email protected]_6?distro=rhel pkg:rpm/[email protected]_6?distro=ubi pkg:rpm/[email protected]_6?distro=redhat-8.4 pkg:rpm/[email protected]_6?distro=redhat-84 pkg:rpm/rhel/[email protected]_6 pkg:rpm/redhat/[email protected]_6 pkg:rpm/redhat/[email protected]_6?distro=redhat-8.4 ...
An official example or two would instantly clarify this, and give the users of purl something to unify around.
To resolve the "rhel" vs "redhat" ambiguity - perhaps the namespace and/or distro should be defined as the string in the /etc/OSNAME-release file? Or something from within the RPM?
If the redhat packages come from https://cdn-ubi.redhat.com/content/public/ubi/dist/ubiVERSION then should namespace and/or provider be set to ubi or ubi-VERSION, or is "redhat" the right string to use?
Currently everyone is making up their own interpretation which stops purl being a reliable universal way to locate for RPMs
I'd like to revive this thread after the fairly recent release of public PURL guidelines meant for mainly Red Hat products: https://redhatproductsecurity.github.io/security-data-guidelines/purl/
In short, that guideline still doesn't provide clarity on how to infer the namespace value, what it does though is it replaces repo URLs with repo IDs due to mirrors, and URLs pointing to a paid CDN (however questionable it may appear from PURL upstream perspective). It also doesn't provide a guideline on what to do with 3rd party vendor-hosted rpm packages not affiliated with any repository (neither does the upstream PURL spec). Last thing the Red Hat PURL guideline does is it allows putting src into the arch qualifier if the RPM package is in fact a source RPM (also something the upstream PURL spec doesn't cover).
So my ask of the community here would be:
- can we work together to find a reasonable way of inferring the namespace field? IOW, the
VendorRPM tag is listed under the informative tags which means it's effectively optional - how private content is represented, IOW what if the repository is private (internally hosted) or behind a CDN paywall? Would the community be willing to follow Red Hat's lead on that and accept
repository_idas the uniqueness guaranteeing field? - can the spec be enhanced to deal with source RPMs as well as packages hosted on vendor controlled web sites?
FWIW our team recently decided to adopt the Red Hat guideline for the time being until the PURL spec maintained here is more bulletproof (as apart from this PURL spec and that guideline in question isn't anything else public IIRC): https://github.com/containerbuildsystem/cachi2/pull/600
If I look at os-release
https://www.freedesktop.org/software/systemd/man/latest/os-release.html
It has interresting value CPE_NAME that has vendor defined:
https://en.wikipedia.org/wiki/Common_Platform_Enumeration#vendor
And with:
source /etc/os-release && echo $CPE_NAME | cut -d '"' -f 2 | cut -d ':' -f 3
you will get "redhat" for RHEL and "fedoraproject" for Fedora, "almalinux" for Alma...
If you agree, I can prepare PR that states that the vendor should be extracted from CPE_NAME in os-release file.
Note that there is PR in queue related to this https://github.com/package-url/purl-spec/pull/370 but enumeration IMO does not scale. It defines most popular ones, but the rest will stay in the dark.
But what about 3rd party RPM repos? Think EPEL.
If we use os-release at build time, some EPEL packages would say redhat, some would say centos. And if I construct the purl on runtime, it might as well say almalinux. While in fact, I'd expect it to say epel all the time (or alternativly even fedora(project)).
how private content is represented, IOW what if the repository is private (internally hosted) or behind a CDN paywall? Would the community be willing to follow Red Hat's lead on that and accept repository_id as the uniqueness guaranteeing field?
repository_id can contain double dot. And e.g., Copr is actively using it: "copr:copr.fedorainfracloud.org:group_mock:mock"
Would that be a problem for syntax of PURL?