purl-spec icon indicating copy to clipboard operation
purl-spec copied to clipboard

Proposal: Add version ranges

Open david-a-wheeler opened this issue 4 years ago • 16 comments

Package-URL by itself isn't very good for reporting about vulnerabilities because it cannot report version ranges. CPE, by contrast, can report version ranges.

This commit proposes adding version range support to package_URL. I think node-semver is pretty common, so I referenced that; other options are possible too. Note that even if you don't use semver, the version range still works, since hierarchical numbers are very common.

Signed-off-by: David A. Wheeler [email protected]

david-a-wheeler avatar Oct 01 '20 02:10 david-a-wheeler

This PR does not address issues identified in #66 and and #84.

stevespringett avatar Oct 01 '20 02:10 stevespringett

Thanks for the cross-reference!

I think this does handle Debian OR. Doesn't "||" provide the same functionality? If not, please enlighten me!

Handling epochs is a bigger issue. But epochs work very much like a "hidden 0. prefix" on a typical version number, and people who follow SemVer won't have epochs. So I think this could be easily modified to support that case. Something like "Versions that begin with the form NUMBER: are presumed to have an epoch of that number; if unstated the epoch is 0. Epochs act as the topmost number in a version."

I'm absolutely not married to node-semvar range syntax; others would be fine. However, I think it's important to have some syntax for it, and I thought it'd be easier to start with a specific concrete proposal.

I did pick node-semvar intentionally. One reason is that it can express some fairly complex situations. For example, it can relatively easily handle the "multiple streams of versions" case.

david-a-wheeler avatar Oct 01 '20 02:10 david-a-wheeler

In addition to the above, one additional aspect that will need to be addressed is the default repository url for each PURL type. Introducing a range may alter where a component is located, so we'll need to specify what that means.

E.g.

  • Does a range eliminate the default repository url entirely?
  • Does a range continue to have a default repository url, but a list of alternatives may be specified?
  • Does the range only work if the location does not change and a new range has to be used for those where the repo url is different?
  • Something else?

stevespringett avatar Oct 01 '20 02:10 stevespringett

@stevespringett - I think that's backwards. Instead, the version numbers necessarily only apply to the given repo URL (default URL if there's a default and no other URL is given, or a specified URL if one is specified).

For example, there are many different programs all called "zlib"; some of them have the same underlying origin. If I specify version X.Y.Z, that version number would only apply to the repo URL given. If there's a different repo URL, there's no reason to believe that version X.Y.Z refers to the same thing.

david-a-wheeler avatar Oct 01 '20 04:10 david-a-wheeler

With NIST moving away from CPE to SWID, hows does this impact PURL?

johnmod3 avatar Oct 01 '20 13:10 johnmod3

I believe SWID has two big problems: (1) it's a huge XML file, making processing it very inconvenient, and (2) it can't refer to version ranges. This makes moving from CPEs hard. A big advantage of SWID over package-URL is that SWID can refer to closed source software where no VCS or download source is publicly visible.

If pURL supports version ranges and has a mechanism for referring to software home pages (or something else that supports closed source software), then package-URL suddenly has big advantages over SWID for many purposes.

david-a-wheeler avatar Oct 01 '20 15:10 david-a-wheeler

Hi David, totally agree - but NIST seems to be hell bound to go with SWID (even though Microsoft doesn't seem to use it any longer) - JC has way more context

johnmod3 avatar Oct 01 '20 16:10 johnmod3

David. I think the zlib example is incomplete. To my knowledge, the zlib project does not have a default distribution repository. However, distributions that package zlib and distribute it to Conan, Buckaroo, Anaconda, Debian, Redhat, etc, will have a default repository.

I think a cleaner example (and more representative of the majority of software packages) are those that distribute the package to a repository when released. Typically, each ecosystem will have a default package repository. npm.js for Node.js, Maven Central for Java/Maven projects, etc.

My prior point was that if you introduce version ranges, we need to also account for how those package versions can be resolved.

For example, the https://github.com/everit-org/json-schema/ project previously distributed all their compiled packages to Maven Central. However, Maven Central lacks certain security controls and transparency, so the json-schema project moved the default repository for the project to use Jitpack instead. Newer versions of json-schema are not available on Maven Central. Therefore, version ranges need to account for these scenarios.

The first sentence of the PURL spec is:

A purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.

So location information needs to be addressed in version ranges when the default repository moves between versions.

stevespringett avatar Oct 01 '20 17:10 stevespringett

Also FYI, I chatted with @pombredanne about trying to represent SWID in PURL, and there seems to be interest in that.

Also regarding SWID, the attributes of the SoftwareIdentify field can be represented in PURL and can be used for vulnerability use cases - whenever the NVD gets around to supporting it. I have a few different ideas how that can work and will be submitting a PR - however, I've been holding off until #79 is merged.

stevespringett avatar Oct 01 '20 17:10 stevespringett

I agree that this scenario needs to be addressed. However, I think it would be cleaner for people who need to support ranges to also support multiple package urls separated by whitespace. Then you could have a package URL with a version range and refer to a particular location. This also cleanly deals with the case where version numbers are different for different locations, a common challenge. It also deals with the case where there are many different packages that share a underlying problem, for example, a vulnerability in a specification can cause multiple packages with many versions with them to all apply.

I think that would be much cleaner than trying to link version numbers and repository locations within a single package URL.

david-a-wheeler avatar Oct 01 '20 21:10 david-a-wheeler

I do agree that it should have ranges I added some reasoning behind it here on one of the OSSF Issues to @pombredanne

kerberosmansour avatar Oct 02 '20 16:10 kerberosmansour

As I noted earlier, I'm not married to this specific version syntax. There are several to choose from.

Here's another: https://raw.githubusercontent.com/CVEProject/automation-working-group/master/cve_json_schema/v5.x_discuss/cve513.schema

I think the key is to be able to support a range of version numbers. The "special" case is epoch numbers, which many systems can't handle, but as I noted earlier I think those are relatively easy to add.

david-a-wheeler avatar Oct 05 '20 16:10 david-a-wheeler

@david-a-wheeler first thank you++ for this!

I am not too much in favor of overloading the version with a ranges syntax as I feel it will be a source of confusion. I would rather craft a new qualifier for that instead. What do you think? There are also considerations wrt. the many versioning schemes and how versions cane be compared that I collected in https://github.com/package-url/purl-spec/issues/84#issuecomment-707766899

pombredanne avatar Oct 13 '20 14:10 pombredanne

@stevespringett re:

My prior point was that if you introduce version ranges, we need to also account for how those package versions can be resolved.

It makes sense, and in particular how they are compared matters quite a bit.

For example, the https://github.com/everit-org/json-schema/ project previously distributed all their compiled packages to Maven Central. However, Maven Central lacks certain security controls and transparency, so the json-schema project moved the default repository for the project to use Jitpack instead. Newer versions of json-schema are not available on Maven Central. Therefore, version ranges need to account for these scenarios.

The first sentence of the PURL spec is:

A purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.

So location information needs to be addressed in version ranges when the default repository moves between versions.

IMHO in this case location information would come from specifying a repository_url=https://jitpack.io qualifier for these package versions that are there and dealing with this specific package and its move more one place to another would have to be handle specially. It looks like https://github.com/everit-org/json-schema is exceptional enough that it would not deserve a generic treatment. And actually based on their home page we really have two Package URLs so this is an even better case:

  • newer pkg:maven/com.github.everit-org.json-schema/[email protected]?repository_url=https://jitpack.io
  • older pkg:maven/com.github.erosb/[email protected] published at Maven Central (and BTW kudos to @erosb for doing this to avoid confusion)

pombredanne avatar Oct 13 '20 14:10 pombredanne

Hello, maven coordinates of the everit-org/json-schema are quite baroque, mostly because I left everit-org (the company) soon after the first release of the json-schema project. So the history is:

  • it is released as org.everit.json:org.everit.json.schema in range 1.0.0 - 1.5.1 on maven central
  • the newer versions (1.6.0 - 1.12.1) are primarily on jitpack
  • versions 1.9.2 - 1.12.1 are available both on jitpack and maven central, but on maven central the groupId:artifactId is com.github.erosb:everit-json-schema (while on jitpack the same releases are available as org.everit.json:org.everit.json.schema)
  • a JDK6-comatible backport of version 1.9.2 was created by a contributor that I released with the coordinates com.github.erosb:everit-json-schema-jdk6:1.9.2

I don't expect any tooling to support it :)

erosb avatar Oct 14 '20 08:10 erosb

@david-a-wheeler Please see #139 for an extended take on the same topic, but drafted as a separate mini spec.

pombredanne avatar Nov 30 '21 15:11 pombredanne