DependencyCheck icon indicating copy to clipboard operation
DependencyCheck copied to clipboard

Utilize NVD API instead of data feed

Open jeremylong opened this issue 1 year ago • 4 comments

The NVD will be retiring the NVD data feeds in 2023. See changes to feeds and APIs. ODC needs to migrate to the NVD's API.

Current concerns:

  1. How will we support offline users?
  2. Users of the API will require an API key due to rate limiting.

jeremylong avatar Aug 05 '22 09:08 jeremylong

Users of the API will require an API key

Are your sure the key is mandatory? The following sentence from their announcement to me reads like it is optional.

users transmitting requests without a key will see a reduction in the number of requests they can make in a rolling 60 second window.

-> it will work without key, but slower?

marcelstoer avatar Aug 10 '22 14:08 marcelstoer

users transmitting requests without a key will see a reduction in the number of requests they can make in a rolling 60 second window.

-> it will work without key, but slower?

I think it will not work at all if the rate limit is hit and simply error out with an http 429. The same thing is already happening with the OSS index api. So, let me take this opportunity to address a couple of other concerns wrt using api's:

  • Just like we're seeing with the OSS index , is that without account things get rate limited and failing our jobs (http 429). Creating an account might "solve" it, but what is the rate limit for users with an account? Who says sonatype won't change them overnight (if there service gets more and more popular) letting fail our jobs tomorrow? Same going to happen with NVD api?
  • Another concern is speed. A local database can only be an issue when initially downloaded or updated, but even then it would be easy to cache somewhere. What about those api's? How are we going to get some kind of sla out of them? API temporary down = failing job?
  • Finally, with an api we are sending information to some system out there. This means for example also names of internal artifacts and what not. Who says that I want to disclose such information? Ok, right now we are trusting the NVD database, but this is a pull rather than push: it's pulling in datafiles, it will at most contain false cve information making the check useless.

To summarize: nothing this (great!) tool can do about it, but I have the feeling that using api's will render this tool less suitable for certain types of projects than it is today

koen-serneels avatar Aug 12 '22 08:08 koen-serneels

@koen-serneels you perfectly summarized the concerns I myself have regarding the use of the NVD API.

nothing this (great!) tool can do about it

I was wondering if it might be somehow possible for this project to continue providing feed files (e.g. hosted here on GitHub). Could a ODC project account not subject to rate limiting regularly pull data off the NVD API and create those feed files?

marcelstoer avatar Aug 12 '22 11:08 marcelstoer

My thoughts go more along the lines of the current datastream usage: find some way to cache within the fenced environment the full historical CVE information and periodically refresh by pulling all intermediate updates from the NVD API (one of their APIs, the cves API appears at first sight to be usable to replace the cveModified stream at first glance with a more targeted retrieval of all updates that have not-yet been seen).

The main open issue I see is the bootstrapping of the cveDB using the API, as I expect that will run into the API rate-limit quite fast. The gzipped datastreams by CVE year provided for a clean solution for that (and could nicely be cached in the fenced environment for internal usage to bootstrap the local cveDBs of developers), API-based retrieval of the same datavolume feels like API abuse.

I wonder whether the NVD changes might be triggered by excessive dataload from (cloud-hosted?) build environments that dispose of cached data and retrieve the entire vulnerability dataset from scratch on builds/scans.

aikebah avatar Aug 13 '22 13:08 aikebah

@aikebah I've often wondered how much of the load on the NVD has been caused by this project...

I wonder whether the NVD changes might be triggered by excessive dataload from (cloud-hosted?) build environments that dispose of cached data and retrieve the entire vulnerability dataset from scratch on builds/scans.

jeremylong avatar Sep 24 '22 14:09 jeremylong