cve-bin-tool icon indicating copy to clipboard operation
cve-bin-tool copied to clipboard

Switch to using cve 2.0

Open terriko opened this issue 6 months ago • 10 comments

We're currently using cve 1.1 data from the mirror, but there's some years worth of data that are only available in the cve 2.0 directory, so we should probably switch to that format going forwards. Not sure off the of my head what this will take but there were some schema changes.

terriko avatar Jun 24 '25 21:06 terriko

Tagging @joydeep049 @JigyasuRajput -- if you're currently blocked by our CI, this might be a thing to work on.

terriko avatar Jun 24 '25 21:06 terriko

Sorry for getting to this so late. Apparently my github mail service is down, so I didnt get an email saying I was tagged. Looking into this.

joydeep049 avatar Jun 25 '25 18:06 joydeep049

Also, I have no idea what to look at to make this change. @terriko @anthonyharrison @mastersans

joydeep049 avatar Jun 25 '25 18:06 joydeep049

@joydeep049 I think there were some format changes and some parameter names changed.

I would suggest you get 20 records from each version and compare them to find the differences. Then write a translator to reformat a cve1.1 record into a cve2.0 record.

anthonyharrison avatar Jun 25 '25 19:06 anthonyharrison

Where I'd start:

  1. Change the url in nvd_source.py to use cve/2.0 instead of cve/1.1 and see what breaks.
  2. Look up whatever thing it errors out on and figure out where that piece of data went in the new data structure. I usually just open the json file and search, they often don't change names just location in the data structure. Sometimes you'll need to read, though.
  3. Fix the code so if you're in a 2.0 data structure it looks for the right thing.
  4. Repeat 2-3 until it starts working.

I did step 1 myself just to see if it was going to be easy so I know there's at least 1 piece of data that isn't in the expected place, but I don't really know how bad it'll be beyond that. We only keep maybe a dozen pieces of data, though, so it should be pretty doable.

(I've got J working on fixing the mirror so the cve 1.1 data won't be broken, but if it's never updating that's a temporary fix at best. if the 1.1 data will no longer be updated.)

terriko avatar Jun 25 '25 20:06 terriko

Hi! Given this issue hasn't got any update since June, and given that I needed CVEs before 2018 in cve-bin-tool for my uses case, I proceeded to try to fix it in PR #5265 . This was a month ago and it did run fine back then, and the PR passed all CI checks.

However, at some point in August, the mirror started to have missing files under https://v4.mirror.cveb.in/nvd/json/cve/2.0. Namely, it is missing the .json.gz for 2006 and meta files for 2002, 2003 and 2004. This caused a test to fail (and normal usage would fail as well), and so the PR got a failing check. It still does.

I don't know who maintains this mirror and why it is now missing a few files. I don't know either what should be the proper course of action against that. But I would definitely like a bit of guidance from maintainers or whoever can tell how far should cve-bin-tool go to cope with missing files.

For instance, test/test_cvedb.py::TestCVEDB::test_refresh_nvd_json tests that json files were properly downloaded. This is what it is meant to test, because that's definitely one of the expected results of cvedb.refresh()). But doing so it also implicitely asserts that the mirror offers all files from 2002 to this year. Should a unit test pass or fail according to file availability on an external server? I don't know. If it passes one day and fails the other, is there anything the tested function could/should do about it?

gluesmith2021 avatar Sep 08 '25 02:09 gluesmith2021

The mirror is being maintained by @warthog9

If I had to guess, it's related to the weirdness we had with cloudflare sometimes just throwing out random crap instead of the correct files. Not sure what to do about it -- we're effectively mirroring something that throws out a lot of bad data and I don't know that there's a solution for that that we can do consistently on our end? Like, someone please fund NVD? Or replace them with something that doesn't have infrastructure problems? There's a bunch of efforts going on but none of them has emerged as a clear winner I don't think. If anyone's got any bright ideas for dealing with the bad data better I'm happy to have more ideas (although fair warning: I have very little time for implementing or even code reviewing them as I am no longer paid to maintain this project).

terriko avatar Sep 22 '25 17:09 terriko

Some context: the data we've been getting out of NVD has been not great of late, the last round of problems involved them literally handing us 0 length files, case in point:

-rw-r--r--. 1 root root    0 Sep 22 13:42 nvdcve-2.0-2007.json.gz
-rw-r--r--. 1 root root    0 Sep 22 13:42 nvdcve-2.0-2007.json.zip

Which is not fun, and problematic to deal with, there's some previous discussion elsewhere in the tickets here on that.

Right now it's settled check it see if it's acting closer to what you are expecting @gluesmith2021

warthog9 avatar Sep 22 '25 22:09 warthog9

Thank you @warthog9 , both for the information and the resolution. It seems to be working fine now. I updated my PR yesterday (a simple upstream merge) to re-trigger the CI, and it's pending maintainer approval for workflows.

gluesmith2021 avatar Oct 02 '25 14:10 gluesmith2021

Give a shout if it starts acting wonky again

warthog9 avatar Oct 02 '25 17:10 warthog9