vulnerability-db
vulnerability-db copied to clipboard
Simplify `vers` ranges
The cve_index
table currently contains many vers
ranges with redundant constraints.
As per vers
specification:
These pairs of contiguous constraints with these comparators are redundant and invalid (ignoring any != since they can show up anywhere):
- =, < or <= followed by < or <=: this is the same as < or <=
- > or >= followed by =, > or >=: this is the same as > or >=
The algorithm to simplify is documented in the spec: https://github.com/package-url/purl-spec/blob/version-range-spec/VERSION-RANGE-SPEC.rst#version-constraints-simplification
I believe the univers
library supports simplification as well.
Looking at some of the constraints, it makes me wonder if perhaps the current representation is a result of attempting to split ranges into pairs of 2 (as demonstrated in the spec)?
In that case, care needs to be taken to not break the meaning of range by erroneously omitting the corresponding lower or upper bounds. Sadly, an algorithm for splitting is not part of the spec, but @sahibamittal implemented one in versatile
recently (tests for verification here).
@nscuro we had the opposite problem with normalization and implementing a version comparison logic that can work with both normalized and non-normalized versions of vers. In the end, we kept things simple to simply duplicate the constraints, so that each vers is guaranteed to be correct.
Thanks for the clarification.
Is the problem that univers
doesn't support all required versioning schemes for the package types that the database pulls in?
Or is it that even those supported by univers
are breaking due to garbage data / unconventional versions in the vuln feeds?
Trying to understand the source of the problem, because if vulnerability-db has these issues while building the database, consumers of it will have the same problems eventually. A majority of functionality in vers
is bound to the requirement that versions can be compared and sorted.
@nscuro I have not tried univers yet. The issue is that some of our sources result in a list of tuples where each tuple is min, max, min_excluding, and max_excluding. We have not come up with a logic to convert such a list of tuples to a single normalized vers where even the list could be unsorted. Other than multiple rows what problem do you foresee with this implementation?
We have not come up with a logic to convert such a list of tuples to a single normalized vers where even the list could be unsorted.
Ideally this would be done by univers
. In the simplest case, you should be able to simply "dump" all constraints into a single vers
range, and then let univers
take care of sorting and simplification. We have a few examples of conversions from GHSA, NVD, and OSV notations here.
Do you have some examples of tuples handy where this would not be feasible?
Other than multiple rows what problem do you foresee with this implementation?
Less a problem, more of an idealistic view that the vulnerability database should provide normalized data instead of having each client perform normalization. After all, it's required to evaluate a range. If the database is already unable to perform this task, the chance of consumers failing to do it is high.
@nscuro, different sources could be describing the same component, but offering slightly different versions. It will require database UPDATE calls which will be quite slow, since the indexes are added only at the end of the creation process.
@nscuro, Here is an example I came across where normalization is quite difficult since the version ranges across the sources do not agree.
In that case it might make sense to track the source for each range, so we can attribute wrong ranges to where they came from. And then potentially apply manual corrections for those sources.
How do you handle that data right now in depscan? If you were to consider all rows, you'd be getting quite a few FPs at the moment.
@nscuro, yup. This is the idea behind an attribute called matching_vers
in search.py. But, of course, it is up to the consuming application to decide on the logic.
depscan currently use the old VDB5 database and then filters the results based on CVE IDs alone. Perhaps, the vulnerabilities array could be used to present the data from all the sources?