kepler
kepler copied to clipboard
Improve data source importing
Currently the NIST importing functionality is too slow, often taking many hours to import the dataset. Taking a look into the codebase it looks like where spawning multiple database transactions in order to import a single entry:
https://github.com/Exein-io/kepler/blob/558afe222b3c21c72a66d26ea1e93695d2c3751c/kepler/src/main.rs#L146-L187
Since a lot of these entries are completely independent of each other we should batch insert them into the database in a single transaction (even packing 1000s of CVEs at a time).
INSERT INTO cves (columns)
VALUES
(cve_1),
(cve_2),
...
(cve_n)
RETURNING *
Which will result in a single BEGIN/COMMIT
per chunk rather than multiple per-CVE. The relational properties are still held within the transaction itself.