Define package types for ASF projects (Apache Software Foundation)
We should define a package type for ASF projects (Apache Software Foundation)
The spec mentioned originally apache for Apache projects packages. The direction may be to use asf rather than apache.
There have been on-going discussion on the ASF mailing lists on the topic and we need to collect these links for reference and invite the ASF folks to join and help define this (important) package type!
@raboof ping
Do you have the links to the threads? I'm curious what the use case is. Project !=Package. In fact I have asserted for >10 years that the major problem with CPE is that it maps to just a project... where a project like Struts has ~80 packages, making it useless for most use cases. Having a pURL recreate that lossy coordinate would be a huge step backwards.
Do you have the links to the threads?
https://lists.apache.org/thread/vc3h1t7plq3sgtqvp385s4nlo3l7rry7 and https://lists.apache.org/thread/75l9f8bcs9fm232p2j3prbj9fw2or2k5 come to mind.
the major problem with CPE is that it maps to just a project... where a project like Struts has ~80 packages, making it useless for most use cases. Having a pURL recreate that lossy coordinate would be a huge step backwards.
That would be good to flesh out. I could see an approach where we use the PMC id as the first segment, and the PMC can determine whether/how to add further detail - something like pkg:asf/celix could perhaps stand on its own, while struts might introduce pkg:asf/struts/oval-plugin etc for its various components. We should probably give some guidance on how to apply that. WDYT?
Most Apache projects fall into existing support for package ecosystems already supported by purl. See https://projects.apache.org/projects.html?language
Per definition, a purl is:
a URL string used to identify and locate a software package...
I cannot locate pkg:asf/struts/oval-plugin. Is on Maven Central or somewhere else? Additionally, oval-plugin already has a purl which is:
pkg:maven/org.apache.struts/[email protected] therefore adding pkg:asf/struts/oval-plugin would introduce confusion IMO.
That would be good to flesh out. I could see an approach where we use the PMC id as the first segment, and the PMC can determine whether/how to add further detail - something like
pkg:asf/celixcould perhaps stand on its own, while struts might introducepkg:asf/struts/oval-pluginetc for its various components. We should probably give some guidance on how to apply that. WDYT?
Yes, we could follow the directory structure in downloads.apache.org (or archive.apache.org) to provide the coordinates of a source archive (which is the only official product of the ASF). For example:
pkg:apache/logging/[email protected],pkg:apache/[email protected],pkg:apache/tomcat/tomcat-11%[email protected],pkg:apache/commons/lang%[email protected].
However, as the examples show, the location of the source archives varies greatly between PMCs and even varies in time (older Log4j Kotlin API where in logging/log4j/kotlin, but newer are in logging/log4j-kotlin). The naming of the source archives is even less regular.
I think that right now there is no sense in introducing an asf type. We could think about that when Apache Trusted Releases is out and forces some order in this chaos.
Since this ticket was originally created, it is now possible to use the SWID PURL type.
For example:
pkg:swid/Apache Software Foundation/apache.org/logging/[email protected]?tag_id=de20a672-4e41-449a-adf7-945f978ede9d
@ppkarwasz re in our private chat, you wrote
The archives in this folder are a problem: https://downloads.apache.org/solr/solr/9.8.1/, we need a way to give a PURL to it. The generic type has the problem that the name is made-up and not really canonical.
With an apache type we could have something like this:
- For https://downloads.apache.org/solr/solr/9.8.1/ a base of
pkg:apache/solr/[email protected] - For https://downloads.apache.org/solr/solr/9.8.1/solr-9.8.1-src.tgz
pkg:apache/solr/[email protected]?file_name=9.8.1/solr-9.8.1-src.tgz
... where solr is a name of a top level project as in https://solr.apache.org/
And for logging/log4j:
- home is https://logging.apache.org/log4j
- base purl is
pkg:apache/logging/log4j - for https://downloads.apache.org/logging/log4j/2.24.3/apache-log4j-2.24.3-bin.zip the PURL would be:
pkg:apache/logging/[email protected]?file_name=apache-log4j-2.24.3-bin.zipwhich would account for when things move to archive like for the ever popular https://archive.apache.org/dist/logging/log4j/1.2.17/
And yes, there is always the CPE-assigned name issue raised by @brianf in https://github.com/package-url/purl-spec/issues/305#issuecomment-2168387802 which is real Yet this may be better than a generic purl or a swid, especially if we can check that the convention works OK across ASF projects.
This is a big change. If PURL is going to support both ecosystems and vendors for the type, then I would be prepared for an influx of organizations and foundations trying to reserve names. At least with SWID, they would be able to use it without having an influx of PURL types being defined. Additionally, what happens when two vendors, both named Acme, try to use it? Using apache without using DNS, one might assume Apache Corporation instead of Apache Software Foundation.
The other big change is location. In the case of Apache, its fairly simple, where they have a /logging/log4j directory structure, but what happens when precedence has been set that vendors now can have PURL types and Acme's download server uses something like https://download.example.com?fid=0123456789. I certainly would not want to end up with something like pkg:/acme/0123456789 or (worse) pkg:/acme/my%20application?fid=0123456789 where 0123456789 is the unique identifier to download the package.
I would seriously consider using either CPE or the SWID PURL type instead of introducing vendor-specific PURL types. If the SWID PURL type needs revisiting, thats fine, we can do that. But I have a lot of reservations about the precedence this would set, not about ASF specifically.
I agree that vendor-specific types are a dangerous precedent, but:
- CPE is not an URL, in the sense that it does not allow to locate a product.
- SWID is behind a paying wall: does it guarantee uniqueness of the generated identifiers?
Some DNS-based type would be probably ideal with a syntax like pkg:dns/<domain>/<domain_dependent_path>/<product>. The "default" download URL could be provided in a metadata file in a .well-known of <domain>.
I agree that vendor-specific types are a dangerous precedent
That has been part of my hesitation as well: "we" (generally) need a way to refer to software that's not in any of the defined ecosystems. Assuming we have a good way to do that, perhaps we don't need a separate namespace for "ASF software that is not in any of the defined namespaces". OTOH, especially with the '
* CPE is not an URL, in the sense that it does not allow to locate a product.
I'm not sure if that's really a requirement? I would be OK with just being able to 'refer to' software tbh, and leave the 'location' of products to be defined elsewhere.
The problem I see with CPE is that it's not federated, so you need to talk to the NVD CPE team (https://nvd.nist.gov/products/cpe) before using a new CPE. This would mean an explosion of the CPE dictionary if you'd want to use it to refer to software in e.g. an SBOM - I just don't see how that would ever scale in practice.
Also @brianf mentioned above that you cannot use CPE to refer to 'sub-artifacts' of projects, which would be a problem.
* SWID is behind a paying wall: does it guarantee uniqueness of the generated identifiers?
I think https://datatracker.ietf.org/doc/rfc9393/ lets you 'read between the lines' of what 'regular' SWID looks like as well. Here the importance of the mandatory globally unique 'tagId' makes it awkward to use in our context. I'm sure we could find a 'hack' to make it 'technically work', but...
Some DNS-based type would be probably ideal with a syntax like
pkg:dns/<domain>/<domain_dependent_path>/<product>. The "default" download URL could be provided in a metadata file in a.well-knownof<domain>.
something like this, or even the generic type (perhaps with some conventions around it) might then be a better fit?
OTOH, perhaps we have enough 'structure' in the ASF to justify having an asf Purl type to mirror that structure - I do think the examples @pombredanne gives above are compelling...
For SWID specifically, refer to https://package-url.github.io/purl-swid-generator-ui/ for an easy way to generate them - and it relies on DNS per the specification. While the SWID spec doesn't strictly enforce UUIDs, is strongly recommends them, so yes, in most cases the SWID identifier will be globally unique.
For SWID specifically, refer to https://package-url.github.io/purl-swid-generator-ui/ for an easy way to generate them
Sure - the challenge is not formatting the SWID, but properly 'managing' the tag id. If you randomly generate a new tag id every time you want to refer to the same artifact, that makes the tag id useless. If you want to always use the same tag id for the same artifact, then you either need some 'tag registry', or distribute the tag inside the artifact, or get creative and base the tag on a hash of the other fields (but that sounds like a hack). This seems to unnecessarily complicate things and be at odds with the Purl principle that ideally the Purl should 'speak for itself' from context as much as possible.