Collect EUVD data
See https://euvd.enisa.europa.eu/ Some notes:
- this is based on CIRL's vulnerability-lookup backend.
- it is not clear what new data it provides
- it does provide new aliases
EUVD API Documentation: https://euvd.enisa.europa.eu/apidoc
Can I take up this issue?
@Samk1710 Yes, sure, you can start working on that.
EUVD API Documentation: https://euvd.enisa.europa.eu/apidoc
The endpoints enlisted have a maximum response limit. The /search endpoint has the highest upper response limit of 100 while others have only 8. A humongous 450789 advisories are listed separated with pagination, and considering the maximum response limit to be 100, a huge number(nearly 4.5k) of API calls will have to be made(huge network overhead) in order to fetch all the advisories.
How shall we proceed with this importer considering this constraint? Should we use a suitable alternative source?
Do let me know. Thank you.
Hey @Samk1710, we need to clarify the time frame for the rate limit first (e.g., is it 100 requests per minute, per hour, or per day?). We should also avoid using the search endpoint. The main task is to iterate over the advisories and ingest the data into our database. (The network API calls aren’t a problem, but we should use them wisely.)
If we have a better alternative source that provides the same data with appropriate license, we should use it
Hey @ziadhany To fetch all the advisories from the EUVD dataset we would need to iterate over the /search endpoint about 4,500 times (using size=100, which is the max response limit) during the initial run of the importer.After that, we can switch to an incremental approach with far fewer calls.
The date based incremental approach would be: --Track when we last fetched advisories (last_fetched_date). --On each run, query the API for records published or updated after that date. --Process them, then update last_fetched_date
We can expect the initial ingestion to take roughly 3-5 hours, and then regular incremental imports going forward.
Do let me know about your thoughts on this. Thanks a lot!
@Samk1710
I don't think we need last_fetched_date. The importer can simply run and pull all the data by iterating over the /search endpoint with fromDate/toDate (or fromUpdatedDate/toUpdatedDate). No switching is necessary, if we can fetch everything each time the importer runs.
https://euvdservices.enisa.europa.eu/api/search?fromDate=2023-01-14&toDate=2023-02-14&page=1&size=100 https://euvdservices.enisa.europa.eu/api/search?fromDate=2023-01-14&toDate=2023-02-14&page=2&size=100
Hey @ziadhany @AyanSinhaMahapatra Could we gather any updates on the license of EUVD?
https://github.com/aboutcode-org/vulnerablecode/pull/2046
Hey, I've raised a PR. Would look forward to review and improvements. Thank you!