vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

Collect EUVD data

Open pombredanne opened this issue 7 months ago • 9 comments

See https://euvd.enisa.europa.eu/ Some notes:

  • this is based on CIRL's vulnerability-lookup backend.
  • it is not clear what new data it provides
  • it does provide new aliases

pombredanne avatar May 13 '25 18:05 pombredanne

EUVD API Documentation: https://euvd.enisa.europa.eu/apidoc

ziadhany avatar May 14 '25 16:05 ziadhany

Can I take up this issue?

Samk1710 avatar Nov 17 '25 15:11 Samk1710

@Samk1710 Yes, sure, you can start working on that.

ziadhany avatar Nov 17 '25 15:11 ziadhany

EUVD API Documentation: https://euvd.enisa.europa.eu/apidoc

The endpoints enlisted have a maximum response limit. The /search endpoint has the highest upper response limit of 100 while others have only 8. A humongous 450789 advisories are listed separated with pagination, and considering the maximum response limit to be 100, a huge number(nearly 4.5k) of API calls will have to be made(huge network overhead) in order to fetch all the advisories.

How shall we proceed with this importer considering this constraint? Should we use a suitable alternative source?

Do let me know. Thank you.

Samk1710 avatar Nov 18 '25 15:11 Samk1710

Hey @Samk1710, we need to clarify the time frame for the rate limit first (e.g., is it 100 requests per minute, per hour, or per day?). We should also avoid using the search endpoint. The main task is to iterate over the advisories and ingest the data into our database. (The network API calls aren’t a problem, but we should use them wisely.)

If we have a better alternative source that provides the same data with appropriate license, we should use it

ziadhany avatar Nov 19 '25 13:11 ziadhany

Hey @ziadhany To fetch all the advisories from the EUVD dataset we would need to iterate over the /search endpoint about 4,500 times (using size=100, which is the max response limit) during the initial run of the importer.After that, we can switch to an incremental approach with far fewer calls.

The date based incremental approach would be: --Track when we last fetched advisories (last_fetched_date). --On each run, query the API for records published or updated after that date. --Process them, then update last_fetched_date

We can expect the initial ingestion to take roughly 3-5 hours, and then regular incremental imports going forward.

Do let me know about your thoughts on this. Thanks a lot!

Samk1710 avatar Nov 21 '25 20:11 Samk1710

@Samk1710

I don't think we need last_fetched_date. The importer can simply run and pull all the data by iterating over the /search endpoint with fromDate/toDate (or fromUpdatedDate/toUpdatedDate). No switching is necessary, if we can fetch everything each time the importer runs.

https://euvdservices.enisa.europa.eu/api/search?fromDate=2023-01-14&toDate=2023-02-14&page=1&size=100 https://euvdservices.enisa.europa.eu/api/search?fromDate=2023-01-14&toDate=2023-02-14&page=2&size=100

ziadhany avatar Nov 22 '25 14:11 ziadhany

Hey @ziadhany @AyanSinhaMahapatra Could we gather any updates on the license of EUVD?

Samk1710 avatar Nov 25 '25 15:11 Samk1710

https://github.com/aboutcode-org/vulnerablecode/pull/2046

Hey, I've raised a PR. Would look forward to review and improvements. Thank you!

Samk1710 avatar Nov 26 '25 11:11 Samk1710