Add mostly universal version range spec draft
Originally at https://github.com/nexB/univers/pull/11
This is an work in progress for "vers" a new mostly universal version ranges specification to use as a companion to purl. This is a possible solution to these issues and PRs:
- https://github.com/package-url/purl-spec/issues/66
- https://github.com/package-url/purl-spec/issues/84
- https://github.com/package-url/purl-spec/pull/93
- https://github.com/nexB/vulnerablecode/issues/119
- https://github.com/nexB/vulnerablecode/issues/140
Signed-off-by: Philippe Ombredanne [email protected]
It comes with an experimental implementation in Python at https://github.com/nexB/univers/
Overall I like the idea! It seems to me that most of the time purl and vers would provide different functionality for different use cases. However, is there a case you see having purl be able to also incorporate vers through a field (such as in qualifiers)?
@ashcrow re:
Overall I like the idea! It seems to me that most of the time
purlandverswould provide different functionality for different use cases. However, is there a case you see havingpurlbe able to also incorporateversthrough a field (such as in qualifiers)?
yes, using a qualifier in a purl would be a way. I think (but need to double-check) that this is possible without the need to encode anything in the vers version range as I picked the vers component separators such that they are both mostly obvious and the ones commonly used and that they do not collide with a purl ones if we were to use both together.... but there may be still some ugliness in URL encoding as @coderpatros mentioned in https://github.com/package-url/purl-spec/issues/84#issuecomment-891717667
Note that this essentially the proposal of @mprpic in https://github.com/package-url/purl-spec/issues/66#issuecomment-700210017
@david-a-wheeler @copernico @joshbressers @sbs2001 @Hritik14 @bwillis @coderpatros @jhutchings1 @brianf @jbmaillet ... ping... you all have been involved in the discussions that led to this. Your feedback is badly needed.
@kerberosmansour @johnmod3 @erosb you had chimed in on this topic too. Your feedback is welcomed!
This feels like a good idea to me
I like that we can specify explicit versions that are affected or not affected. There will be instances where trying to list ranges will be harder than just listing the specific affected versions.
Long term I imagine we will want to keep a catalog of known versioning-scheme identifiers. I assume "semver" will end up being the default if no other ecosystem fits (should it be? I currently think yes, but I've only thought about it for a few minutes). In my mind I would compare this catalog of identifiers to how SPDX has a list of known licenses. If you want to add a new one, you can submit an issue or PR and discuss it.
I don't have time right now to do this (I might at the end of the month if nobody gets to it first), but I think putting an Examples section at the bottom could make understanding this easier for casual readers.
Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers.
cc: @katecatlin @rschultheis @andrewbredow @reiddraper
@joshbressers wrote:
This feels like a good idea to me
Thank you for the kind encouragements!
I like that we can specify explicit versions that are affected or not affected. There will be instances where trying to list ranges will be harder than just listing the specific affected versions.
Long term I imagine we will want to keep a catalog of known versioning-scheme identifiers. I assume "semver" will end up being the default if no other ecosystem fits (should it be? I currently think yes, but I've only thought about it for a few minutes). In my mind I would compare this catalog of identifiers to how SPDX has a list of known licenses. If you want to add a new one, you can submit an issue or PR and discuss it.
Exactly. Though I came to appreciate that "semver" may be more like an unreachble dream than a reality. For instance, Ruby's semver is not semver. Composer's semver is not semver and am I am not even sure that node-semver is semver strictly either. Also semver has not notation for ranges. This spec could help there.
I don't have time right now to do this (I might at the end of the month if nobody gets to it first), but I think putting an Examples section at the bottom could make understanding this easier for casual readers.
Good point: I will add a bunch of examples!
@jhutchings1 re:
Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers.
Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax
@jhutchings1 re:
Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers.
Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax
Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time: maybe here?
I don't like that the <= constraint has been dropped.
< can include future versions of a component that haven't been released yet.
@coderpatros Thanks!
Can you elaborate on this:
<can include future versions of a component that haven't been released yet.
And about
I don't like that the
<=constraint has been dropped.
- Would you have an example of a case that would difficult to handle?
- Is the (involved) example below what you have in mind?
For example let's look at CVE-2020-11969. I picked mostly at random looking for something recent enough and with multiple release branches living in parallel.
See https://nvd.nist.gov/vuln/detail/CVE-2020-11969 described as:
If Apache TomEE is configured to use the embedded ActiveMQ broker, and the broker URI includes the useJMX=true parameter, a JMX port is opened on TCP port 1099, which does not include authentication.
This affects Apache TomEE 8.0.0-M1 - 8.0.1, Apache TomEE 7.1.0 - 7.1.2, Apache TomEE 7.0.0-M1 - 7.0.7, Apache TomEE 1.0.0 - 1.7.5.
Apache Tomee is insteresting because it defines a semver-like version policy at: https://tomee.apache.org/tomee-9.0/docs/tomee-version-policies.html
Yet this policy does not state this is semver and based on the releases history at https://tomee.apache.org/download-archive.html it is clear that semver is not applied exactly as some versions use four dotted segments. The meaning of how to comapre versions with pre-release and build is not specified either but these are used.
Let's assume for now and for this example these things:
-
the base purl for the main distro package is pkg:apache/tomee (there are also Maven JARs like at https://repository.apache.org/content/groups/snapshots/org/apache/tomee/apache-tomee/9.0.0-M8-SNAPSHOT/ BTW but we will leave these out)
-
tomee is using a strict semver versioning scheme with no specific range notation. We will name this versioning scheme "tomee" for now.
We have expanded NVD CPE ranges at https://nvd.nist.gov/vuln/detail/CVE-2020-11969/cpes?expandCpeRanges=true
The first thing that is clear is that things are not clear:
For the range: From (including) 7.0.0 Up to (including) 7.0.7, the text says rather "7.0.0-M1 - 7.0.7" which are eventually two conflicting statements: an M1 pre-release in semver would always come before 7.0.0. Note also that the computed CPE ranges use a lowercased m1 while the text and upstream uses the uppercase M1.
The vers would be vers:tomee/>=7.0.0|<7.0.7|7.0.7 yet if the tomee versions
are compared using semver, then 7.0.0-m1, 7.0.0-m2 and 7.0.0-m3 would sort before
7.0.0 and not be part of the range. But the NVD includes them (I have no idea
where is the code that computed these version ranges if any, or whether these
ranges are hand curated BTW).
Alternatively the vers may be vers:tomee/>=7.0.0-M1|<7.0.7|7.0.7.
This seems to better match the version timeline where the M milestones always come
before the actual major release.
All these ambiguities and quirks are what we are trying to fix, are we?
The original post by @jgallimore is at http://mail-archives.us.apache.org/mod_mbox/www-announce/202006.mbox/CAGRgoZgmn_+KXxGnf7SGEHU3zBDJJDsARY8iL-4t+vj_6JkbtQ@mail.gmail.com and mirrors the CVE details.
That said, for the full range, the vers could be the more risque:
-
vers:tomee/>=1.0.0|<1.7.5|1.7.5|>=7.0.0-M1|<7.0.7|7.0.7|>=7.1.0|<7.1.2|7.1.2|>=8.0.0-M1|<8.0.1|8.0.1 -
or may be this:
vers:tomee/>=1.0.0-beta1|<1.7.5|1.7.5|>=7.0.0-M1|<7.0.7|7.0.7|>=7.1.0|<7.1.2|7.1.2|>=8.0.0-M1|<8.0.1|8.0.1
Again there is some ambiguity wrt. how tomee deals with semver-like pre-release and build. The release timeline suggests they follow semver overall. The NVD ranges for the CVE are less than clear to me.
Are you saying that:
vers:tomee/>=1.0.0-beta1|<1.7.5|1.7.5|>=7.0.0-M1|<7.0.7|7.0.7|>=7.1.0|<7.1.2|7.1.2|>=8.0.0-M1|<8.0.1|8.0.1
would be better represented this way? :
vers:tomee/>=1.0.0-beta1|<=1.7.5|>=7.0.0-M1|<=7.0.7|>=7.1.0|<=7.1.2|>=8.0.0-M1|<=8.0.1
@jbmaillet you wrote in https://github.com/package-url/purl-spec/pull/139#issuecomment-989646917
Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax
Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time: maybe here https://github.com/ossf/scorecard ?
@laurentsimon @azeemshaikh38 @inferno-chromium @naveensrinivasan @chrismcgehee @dlorenc FYI, gentle ping ^ ... since you seem to be the main authors behind https://github.com/ossf/scorecard
IMHO a scoring criteria would be for a project/package to have a defined versioning policy semver being best, but anything that is clearly defined and non ambiguous would be fine IMHO. Without anything defined, vulnerable ranges resolution is a random fishing expedition for the downstream users.
@jbmaillet you wrote in #139 (comment)
Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax
Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time: maybe here https://github.com/ossf/scorecard ?
@laurentsimon @azeemshaikh38 @inferno-chromium @naveensrinivasan @chrismcgehee @dlorenc FYI, gentle ping ^ ... since you seem to be the main authors behind https://github.com/ossf/scorecard IMHO a scoring criteria would be for a project/package to have a defined versioning policy
semverbeing best, but anything that is clearly defined and non ambiguous would be fine IMHO. Without anything defined, vulnerable ranges resolution is a random fishing expedition for the downstream users.
thanks for tagging us. Looks like a good idea to unify range definition :-) ... as everyone already said Where do you see scorecard fit in the picture? FYI, we're also working on guidelines for oss supply-chain best security practices (ultimately will be used by scorecard and other use cases), and this may a good place to add recommendations. (The initiative is part of the OSSF's [https://github.com/ossf/wg-best-practices-os-developers](Best Practices for Open Source Developers))
FYI, the same working group is also looking at a way for developers to define metadata to their repo (think of it a an extension to SECURITY.md) and there may be scope to add such a field in there if not already considered.
cc @scovetta @david-a-wheeler
@pombredanne if you want to represent a version range that includes the current latest version it's a bit wonky using < as a constraint.
I guess you can just bump the patch version and use that as your constraint. But then, like in the CVE-2020-11969 example above, it will include pre-release versions of that next patch version. Which isn't really what I want.
@pombredanne if you want to represent a version range that includes the current latest version it's a bit wonky using
<as a constraint.I guess you can just bump the patch version and use that as your constraint. But then, like in the CVE-2020-11969 example above, it will include pre-release versions of that next patch version. Which isn't really what I want.
If a vulnerability is not patched, then the '<' condition can be omitted entirely and updated later as the patch is made available. It seems to me that encoding version ranges '<=' using can be error prone here as it assumes the very next version will definitely fix the vulnerability.
But it's very possible I don't understand all the use cases of this range notation :) If an equivalent to '<=' is still needed, this can be encoded by combining '<' with the '=' operator which is still supported as @pombredanne pointed out in the examples above.
Would that work?
@oliverchang yeah I'm thinking about other use cases outside of vulnerability information. Things like existing package manifest version constraints and resolved version ranges during package restore and build processes.
For vulnerability information I very much like the < constraint.
When I looked at the current PR the <= constraint had been removed.
Rather than defining this as a URI string I think it might be better to have this as an IANA registered URN namespace.
Basically what has already been proposed with urn: prefixed. It's still likely to have the same encoding issues with characters like < and >.
And I need to re-read the relevant RFCs. It might not really qualify as a URN.
@coderpatros @oliverchang re:
yeah I'm thinking about other use cases outside of vulnerability information. Things like existing package manifest version constraints and resolved version ranges during package restore and build processes.
Actually the simplified syntax is working OK, but the reduced set of operators is problematic or verbose for several use cases, in particular for dependencies.
It also makes it much harder to implement converters from a native range to this range notation as this requires to "reduce" richer range statements to a simpler notation. This is already needed in all cases for "gem" and "npm" tilde and caret operators. Requiring a reduction for most operators is demanding a lot of all implementations.
You could say that your use case is vulnerabilities only, hence you do not care for dependency ranges. But these are also part of this spec and they cannot be ignored even for a vulnerable range-only use case. For instance say you have a vulnerable range for package foo and that package bar depends on foo with another version range: here you need a way to check how and if the two ranges intersect to determine if the bar versions usable with foo are vulnerable. This would need to have a way to get an accurate vers for both.
And @coderpatros you wrote:
I guess you can just bump the patch version and use that as your constraint. But then, like in the CVE-2020-11969 example above, it will include pre-release versions of that next patch version. Which isn't really what I want.
The problem is that you cannot generally and reliably bump versions unless you have a versioning scheme that supports it. This would be true for node-semver for npm and Rubygems because they need to support this for their tilde ranges, but this is not something that is common otherwise. Therefore in the general case I think that an upper range of < is not able to represent accurately all existing ranges. You would need to add <= because bumping is not possible. And then the same may apply to >= and therefore you need to add a >, which means we need <, <=, > and >=. And IMHO adding back != helps keep the notation easier to read and easier to convert: you could write >1.2.3|<1.2.3 but this is more involved.
There is another benefit to using a !=, <, <=, >, >= : you can now validate and require that any version can show only once in the range spec irrespective of its comparators. This makes ranges overall cleaner and easier to process and read.
@coderpatros re:
For vulnerability information I very much like the
<constraint. When I looked at the current PR the<=constraint had been removed.
Therefore and based on all this, I am adding back these comparators !=, <, <=, >, >=, but still keeping the simplified timeline "signposts" design. Let me push this for you review in a few
The latest version has now examples and extensive pseudocode. I have reverted to use a the richer set of comparator and provided an extensive rationale. Feedback wanted!
IMHO this is ready to merge and use. Unless there is an objection I will likely merge this is week. I would appreciate a formal OK though. @ashcrow @sschuberth @stevespringett @iamwillbar Everyone involved is mucho welcomed to give it's approval!
@pombredanne Just wanted to check in and see if this one's getting near acceptance/merging? Thanks!
@jhutchings1 yes, this is mostly ready to merge. There is a few point that I would like to clarify and I will do this by Monday! For instance, there are some specifics wrt. NuGet handling of version ranges that may need to introduce a "*" in the syntax as it may not be otherwise be easily resolvable to a simplified version expression. And a few minor point that emerged from practical experimentation.