kuzu
kuzu copied to clipboard
Add external node table
Description
This is our initial PR to support directly execute cypher on relational database. It contains logic for node table only. Major changes including
Nested catalog entry.
We add a new catalog entry type ExternalNodeTable
which has a nested entry structure. At parent level, it maintains the logic view of properties which aligns with the columns in relational tables. At child level, it maintains another catalog entry which contains the primary key property only. This child entry aligns with our physical storage.
In the current design, we still need to materialize primary key and use it as join condition when we try to read a property that does not exist in storage.
Scan external table
When we run MATCH (a:label)
where label points to an external relational table, we need to scan external relational table and the primary key column materialized in our storage and then perform a join on primary key.
Some sanity benchmark numbers are
Setup LDBC10 Comment table storing in DuckDB database. 8 Threads.
DuckDB native scanning: 0.3s. Kuzu scanning DuckDB: 2s. Scanning external database: copy primary key (6s) + join (3s) = 9s.
Slower than DuckDB is expected as we need to first materialize DuckDB's result and then re-scan it. The major overhead is in us scanning DuckDB result which @acquamarin should see if we can further optimize this.
Another bottleneck is the copy of primary key. I'm fairly confident I can bring the time to ~2s with some optimization.
Fixes # (issue)
Contributor agreement
- [x] I have read and agree to the Contributor Agreement.