osv.dev icon indicating copy to clipboard operation
osv.dev copied to clipboard

The OSV database contains 100% of vulnerabilities from NVD/CVE since 2016 that are determined to relate to OSS

Open andrewpollock opened this issue 2 years ago • 11 comments

We need to generate OSV records from historical and future CVE records in the NVD that we can determine to relate to Open Source Software.

These records will be keyed by commit.

A side-effect of this is we will start picking up vulnerabilities in C/C++ packages.

andrewpollock avatar Oct 24 '22 03:10 andrewpollock

I wonder if it would be possible to somehow exclude/mark duplicates? For example, https://nvd.nist.gov/vuln/detail/CVE-2021-45941 was presumably generated automatically based on https://osv.dev/vulnerability/OSV-2021-1576 and I don't think it makes much sense to show them both without any links to each other. I think it should also help to somewhat address https://github.com/google/osv.dev/issues/258.

evverx avatar Dec 16 '22 12:12 evverx

Any updates on this?

slonka avatar Jan 23 '23 08:01 slonka

Hi Krzysztof, not sure if you're asking for an update with respect to the work overall, or an update in relation to https://github.com/google/osv.dev/issues/783#issuecomment-1354670824 specifically?

I can give a progress update for interested parties:

  • "low hanging fruit" CVE records are convertible to OSV today, but not productionized
  • I'm able to derive a reasonably exhaustive set of OSS repositories for CVEs, using the ones from 2022 as my test bench
  • I'm now focusing on mapping from versions in CVEs to commits via Git tags to derive fix commits for CVEs where that is not first-order self-evident.

andrewpollock avatar Jan 24 '23 22:01 andrewpollock

Hi Andrew, Thanks for the update! 🙂 I was just wondering if this was actively worked on and if there is anything that can be split into smaller tasks so people can help out.

slonka avatar Jan 25 '23 09:01 slonka

Yes, it's very much being actively worked on. I will look at defining and sharing some milestones, help is always welcome :-)

Everything's currently in a proof-of-concept stage, as I familiarise myself with the input data. Feel free to poke at https://github.com/google/osv.dev/tree/master/vulnfeeds/cpp (I think the README file needs an update)

andrewpollock avatar Jan 30 '23 02:01 andrewpollock

This issue is overdue for a status update.

Work has been progressing well, and the most recent conversion runs are yielding the following results from the 2022 NVD CVE data set:

  • 15,698 CVEs are determined to relate to applications
  • 6,787 of these CVEs have one or more Git repositories identified for them
  • 5,579 of these CVEs with one or more Git repositories are successfully converting to OSV records

There's much validation work still to be done, before expanding to previous years as well as automating ongoing conversion.

andrewpollock avatar May 08 '23 00:05 andrewpollock

Another overdue status update

We're close (targeting early October) to going live with the data currently available. CVEs from 2023-2016 are being processed.

From the 2023 NVD CVE data set:

  • 11,728 CVEs are determined to relate to applications
  • 5,201 of these CVEs have one or more Git repositories identified for them
  • 3,752 of these CVEs with one or more Git repositories are successfully converting to OSV records

andrewpollock avatar Sep 29 '23 01:09 andrewpollock

Early access update for followers of this issue 👋

We've soft-launched this into OSV.dev production overnight, and we now have 31,889 NVD CVE-based records from 2023-2016 in OSV.dev, e.g:

  • https://osv.dev/vulnerability/CVE-2023-4863
  • https://osv.dev/vulnerability/CVE-2023-26130
  • https://osv.dev/vulnerability/CVE-2021-44228

This enables much broader commit hash-based vulnerability scanning, e.g.

curl -d \
  '{"commit": "227d2c20509f85a394133e2be6d0b0fc1fda54b2"}' \
  "https://api.osv.dev/v1/query" | jq '.vulns | map(.id)'
[
  "CVE-2023-26130"
]

Any individual record feedback can be filed using this issue template

We'll now continue to iterate on records that either didn't convert, but look like they should have been able to (weeding out any false positives from this set, and addressing any remaining deficiencies), or did convert but do not successfully import, as well as were discarded because they only half-converted, and would have caused a false positive.

andrewpollock avatar Oct 11 '23 04:10 andrewpollock

hi @andrewpollock - thank you very much for the update! Is this only available via the REST API or does it already work with osv-scanner?

slonka avatar Nov 02 '23 07:11 slonka

Hi @slonka I can advise as of https://github.com/google/osv-scanner/releases/tag/v1.4.3, which went out a few hours ago, the functionality is now fully available in OSV-Scanner directly, as well as the REST API.

andrewpollock avatar Nov 02 '23 07:11 andrewpollock

Our blog post announcing this has just been published: https://osv.dev/blog/posts/introducing-broad-c-c++-support/

oliverchang avatar Nov 07 '23 05:11 oliverchang