kepler Improve data source importing

Improve data source importing

Open JuxhinDB opened this issue 1 year ago • 0 comments

Currently the NIST importing functionality is too slow, often taking many hours to import the dataset. Taking a look into the codebase it looks like where spawning multiple database transactions in order to import a single entry:

https://github.com/Exein-io/kepler/blob/558afe222b3c21c72a66d26ea1e93695d2c3751c/kepler/src/main.rs#L146-L187

Since a lot of these entries are completely independent of each other we should batch insert them into the database in a single transaction (even packing 1000s of CVEs at a time).

INSERT INTO cves (columns)
VALUES
    (cve_1),
    (cve_2),
    ...
    (cve_n)
RETURNING *

Which will result in a single BEGIN/COMMIT per chunk rather than multiple per-CVE. The relational properties are still held within the transaction itself.

Sep 19 '23 12:09 JuxhinDB

kepler kepler copied to clipboard

Improve data source importing

kepler
kepler copied to clipboard