kuzu icon indicating copy to clipboard operation
kuzu copied to clipboard

GDS Features and Optimizations

Open semihsalihoglu-uw opened this issue 7 months ago • 0 comments

Description

I'm writing this to keep track of optimizations and new features we should be adding to Kuzu's GDS component. There is a lot of pieces to this, but this should capture the general scope

  • [x] Spawn new threads when executing a GDSAlgorithm in task scheduler.
  • [ ] Graph Interface Improvements:
    • [ ] In-memory graph instance based on CSR
    • [ ] Have optimized Graph implementations that scan only nbr nodeIDs. Have less optimized Graph implementations that scan both relIDs and nbrIDs.
    • [ ] Improve on-disk to scan a set of vertex's neighbors
    • [ ] Change Graph to return iterators instead of vectors when scanning lists
    • [ ] If algorithms need it support double-directional in-memory graphs
  • [ ] Frontier Interface Improvements:
    • [ ] Make masks memory manager backed.
    • [ ] Experiment with sparse-vs-dense implementation switching
  • [ ] Experiment with the performance improvement of templating FrontierUpdateFn instead of virtual classes for VertexUpdate functions. This means templating functions that takes FrontierUpdateFn as input, specifically those in GDSUtils.
  • [ ] Fix https://github.com/kuzudb/kuzu/issues/3716
  • [ ] Shortest Paths Optimizations
    • [ ] Materializing results in parallel
    • [ ] Write Paths: with BFS-tree-based compact implementation (using EdgeUpdateLists)
    • [ ] Weighted Shortest Paths
    • [ ] If possible, skip writing compact BFS-tree-based storage to first to a FactorizedTable for further operators and then to ValueVectors. Instead write it as a compact data type directly to FactorizedTable and have a separate flatten operator that directly writes to ValueVectors.
    • [ ] Bidirectional BFS
    • [ ] Remove the 254 upper bound limit
    • [ ] Avoid writing tableIDs when possible in Frontiers or SinglePaths or other data structures. We can do this optimization as follows: for each currentFixedTable or currentFixedLastEdges type of fields, store the tableID as a separate atomics.
  • [ ] Rewrite existing PageRank and WCC
  • [ ] Implement other algorithms: Identify a suite of algorithms and implement.

semihsalihoglu-uw avatar Jun 27 '24 11:06 semihsalihoglu-uw