osv.dev
osv.dev copied to clipboard
Add parity checking (if you don't already do it)
Is your feature request related to a problem? Please describe.
Originally we thought we needed to call the batch-query API, and implemented code to do that.
But then it turned out OSV expands version ranges into exhaustive lists of versions, so we don't actually need any help from the API--we just need the vulns .json files. Which is great!
However, it made us kind of nervous to stop calling the API. What if OSV stopped expanding the version ranges? What if they added some additional logic in the API that we'd have to duplicate?
To be absolutely sure, we implemented parity checking: once every twelve hours we run a job that enables API calls, generates vuln reports against a baseline of dependencies that have vulns, does the same with API calls disabled, and then compares the two reports.
Do you do something similar? If not, it might be a good idea to add it. For example, parity-checking would have caught the lack of updates reported in https://github.com/google/osv.dev/issues/1981.
We can probably get approval to give you our vulns finder Gradle plugin source code, if you're interested. But it's written in Java, which you may not be interested in.
Describe the solution you'd like
Add parity checking if it's not already being done.
Hi @jimshowalter I'm not exactly sure I'm following what the request is here, but I'll try and paraphrase my understanding.
Are you basically asking if we do a consistency check between what the API returns for a given vulnerability and what is exported to GCS?
If that's the request, then no, we don't currently do that, but it's a very interesting proposal and wouldn't be too difficult to implement.
OSV.dev is entirely Open Source, the easiest entry points to peruse what is and isn't tested are probably
- https://github.com/google/osv.dev/blob/master/Makefile
- https://github.com/google/osv.dev/blob/master/cloudbuild.yaml
We have a list of dependencies that have vulns.
Submitting the list of dependencies to the API returns a list of vuln reports.
Our vulns finder no longer calls the API. Instead it matches Maven coordinates (group, artifact, version) to the names and versions in the .json vuln fils in Maven-all.zip.
The parity check involves calling the API and comparing the vuln reports it returns to the vulns that were matched in the zip.
Because the results match exactly, it means that OSV's version-range-to-versions conversion works perfectly, at least for Maven ecosystems (and for the subset of dependencies in our test baseline).
For what it's worth, the progression was first we used the API for everything, then realized we could cut down on calls by using bulk query to get just the vuln IDs, and look them up in the zip contents, and then we realized we could just use the zip (with a daily parity check just to be sure).
For what it's worth, the progression was first we used the API for everything, then realized we could cut down on calls by using bulk query to get just the vuln IDs, and look them up in the zip contents, and then we realized we could just use the zip
Was there a specific requirement that the API wasn't meeting that made you go down this path?
The API is fine, but we have hundreds of modules in a big repo, each with hundreds of dependencies, and latency adds up. Using the zip, we can find vulns in the repo in a under a second. With the API, it takes over a minute, and that's on a fast network. We have employees all over the world, some with slow internet.
The parity check involves calling the API and comparing the vuln reports it returns to the vulns that were matched in the zip.
Because the results match exactly, it means that OSV's version-range-to-versions conversion works perfectly, at least for Maven ecosystems (and for the subset of dependencies in our test baseline).
So thinking this through a bit, I think we could achieve the desired result by comparing
https://api.osv.dev/v1/vulns/GHSA-24rp-q3w6-vc56 with https://osv-vulnerabilities.storage.googleapis.com/Maven/GHSA-24rp-q3w6-vc56.json
Knowing how our infrastructure works, this is kind of a bit of a moot point. The API isn't doing anything particularly clever in terms of version expansion, it's just serving up the record, as it is stored in Cloud Datastore.
Furthermore, the exporter is just exporting that same data from Cloud Datastore to GCS.
So if the goal is to ensure that GCS exports are consistent with API output, that should hold true by virtue of the same backing storage. If the goal is to ensure that independently implemented version matching logic against the GCS records behaves the same as the API, that's a bit more of a contrived scenario?
It sounds like there's no need for parity checking because of the way it already works. Parity-checking could however detect if something changes in how it works. If it's moot, let's close this.
This issue has not had any activity for 60 days and will be automatically closed in two weeks
Automatically closing stale issue