kuzu icon indicating copy to clipboard operation
kuzu copied to clipboard

Add external node table

Open andyfengHKU opened this issue 6 months ago • 2 comments

Description

This is our initial PR to support directly execute cypher on relational database. It contains logic for node table only. Major changes including

Nested catalog entry.

We add a new catalog entry type ExternalNodeTable which has a nested entry structure. At parent level, it maintains the logic view of properties which aligns with the columns in relational tables. At child level, it maintains another catalog entry which contains the primary key property only. This child entry aligns with our physical storage.

In the current design, we still need to materialize primary key and use it as join condition when we try to read a property that does not exist in storage.

Scan external table

When we run MATCH (a:label) where label points to an external relational table, we need to scan external relational table and the primary key column materialized in our storage and then perform a join on primary key.

Some sanity benchmark numbers are

Setup LDBC10 Comment table storing in DuckDB database. 8 Threads.

DuckDB native scanning: 0.3s. Kuzu scanning DuckDB: 2s. Scanning external database: copy primary key (6s) + join (3s) = 9s.

Slower than DuckDB is expected as we need to first materialize DuckDB's result and then re-scan it. The major overhead is in us scanning DuckDB result which @acquamarin should see if we can further optimize this.

Another bottleneck is the copy of primary key. I'm fairly confident I can bring the time to ~2s with some optimization.

Fixes # (issue)

Contributor agreement

andyfengHKU avatar Aug 19 '24 20:08 andyfengHKU