kuzu
kuzu copied to clipboard
GDS Features and Optimizations
Description
I'm writing this to keep track of optimizations and new features we should be adding to Kuzu's GDS component. There is a lot of pieces to this, but this should capture the general scope
- [x] Spawn new threads when executing a GDSAlgorithm in task scheduler.
- [ ] Graph Interface Improvements:
- [ ] In-memory graph instance based on CSR
- [ ] Have optimized Graph implementations that scan only nbr nodeIDs. Have less optimized Graph implementations that scan both relIDs and nbrIDs.
- [ ] Improve on-disk to scan a set of vertex's neighbors
- [ ] Change Graph to return iterators instead of vectors when scanning lists
- [ ] If algorithms need it support double-directional in-memory graphs
- [ ] Frontier Interface Improvements:
- [ ] Make masks memory manager backed.
- [ ] Experiment with sparse-vs-dense implementation switching
- [ ] Experiment with the performance improvement of templating FrontierUpdateFn instead of virtual classes for VertexUpdate functions. This means templating functions that takes FrontierUpdateFn as input, specifically those in GDSUtils.
- [ ] Fix https://github.com/kuzudb/kuzu/issues/3716
- [ ] Shortest Paths Optimizations
- [ ] Materializing results in parallel
- [ ] Write Paths: with BFS-tree-based compact implementation (using EdgeUpdateLists)
- [ ] Weighted Shortest Paths
- [ ] If possible, skip writing compact BFS-tree-based storage to first to a FactorizedTable for further operators and then to ValueVectors. Instead write it as a compact data type directly to FactorizedTable and have a separate flatten operator that directly writes to ValueVectors.
- [ ] Bidirectional BFS
- [ ] Remove the 254 upper bound limit
- [ ] Avoid writing tableIDs when possible in Frontiers or SinglePaths or other data structures. We can do this optimization as follows: for each currentFixedTable or currentFixedLastEdges type of fields, store the tableID as a separate atomics.
- [ ] Rewrite existing PageRank and WCC
- [ ] Implement other algorithms: Identify a suite of algorithms and implement.