osv.dev icon indicating copy to clipboard operation
osv.dev copied to clipboard

combine-to-osv: Include the CWE(s) from the underlying CVE in the resultant OSV record

Open timothee-chauvin opened this issue 8 months ago • 3 comments

I'm working on a vulnerability detection benchmark using OSV as the main data source. Having as many CWE root causes as possible would be useful for the project. Yet I notice that few CWEs seem to make it into OSV.

There are many CWEs included in e.g. cvelistV5:

$ cd /tmp
$ git clone https://github.com/CVEProject/cvelistV5.git
$ cd cvelistV5
$ fd '.json$' . | wc -l
252803
$ rg '"CWE-' . | wc -l
126266

Yet few in OSV:

$ cd /tmp
$ mkdir osv && cd osv
$ wget https://osv-vulnerabilities.storage.googleapis.com/GIT/all.zip
$ unzip all.zip
$ fd '.json$' . | wc -l
31413
$ rg '"CWE-' . | wc -l
135

For one random example of an entry that has a CWE in the CVE data but not in OSV:

$ rg -C4 '"CWE-' cvelistV5/cves/2023/4xxx/CVE-2023-4696.json
69-                    "descriptions": [
70-                        {
71-                            "type": "CWE",
72-                            "lang": "en",
73:                            "description": "CWE-284 Improper Access Control",
74:                            "cweId": "CWE-284"
75-                        }
76-                    ]
77-                }
78-            ],
$ rg 'CWE' osv/CVE-2023-4696.json
# no match

The only OSV items in the GIT ecosystem that have a CWE are CURL-CVE-* items, and a few PSF-* items.

Is there a reason for this?

For now, it seems that the best way for me to get the CWEs is to also clone the cvelistV5 repository in parallel and get them from there.

Might they be included in OSV at some point in the future?

timothee-chauvin avatar Jun 06 '24 10:06 timothee-chauvin