kuzu
kuzu copied to clipboard
Phase 2 Roadmap
Phase 2 Roadmap
Note: This only documents important features/changes that we aim to deliver in the next phase, and it will be refreshed and updated as we move on.
APIs, Data Ingestion and Exports
- [x] Integrate Arrow. Replace our CSV Reader with Arrow reading. Use the arrow reader to support copying from Parquet, Arrow, and CSV files.
- [ ] Add Restful API.
- [x] Add API extension to convert to a graph format that we can feed to javascript graph viz.
- [x] Add API Extension to PytorchG
- [ ] Add API Extension to DGL
- [x] Add data export support, e.g., from Cypher
CSV Export COPY tbl TO 'output.csv' (HEADER, DELIMITER ',');
Storage and Transactions
- [x] Compete full transaction support and Data Manipulation Clauses:
- [x] REMOVE, DELETE, of edges and nodes with edges
- [x] SET of edge properties
- [ ] Checkpointing Optimizations (https://github.com/graphflowdb/graphflowdb/issues/771)
- [x] Get rid of ad-hoc/unstructured properties for now.
- [ ] Optimizations on lists data structure (https://github.com/graphflowdb/graphflowdb/issues/730).
- [x] Add system versioning information (including storage versioning and compatibility check).
- [x] Improved single buffer pool.
Front-end and Query Processor
- [x] Support UTF-8.
- [x] Full implementation of worst-case optimal joins.
- [x] Full implementation of ASP joins.
- [ ] String and Overflow File Enhancements: https://github.com/graphflowdb/graphflowdb/issues/980
- [ ] Optimization on Hash Join, e.g., Parallel Finalization of Hash Table
- [ ] Degree(a) function to compute node degrees (https://github.com/graphflowdb/graphflowdb/issues/554)
- [x] Unlabeled Node and Rel Queries (QP)
- [x] Undirected Edges (Storage, QP)
- [x] Shortest Paths (Frontend, QP)
- [x] Regular Expressions in string predicates (Frontend, QP)
- [x] Using paths/types in predicates (Frontend, QP)
- [x] CASE ELSE END
Refactoring
- [ ] Get rid of flat/unflat distinction.
- [ ] Un-template disk array, and move storage structures to use disk array.
- [x] De-couple planner and optimizer.
Usability
- [ ] Optional logging.
Other
- [x] ALTER TABLE
- [x] UDF Support
- [x] PROFILE fix
Great project. Is there a ballpark estimate on when there'll be an integration with networkX? (if ever?) Thanks! Raul
Hi @raulcf , thanks for your interest!
Currently, we are working on the integration with Pandas, through which Kuzu can be used with networkX, but we haven't prioritized the direct integration with networkX yet. We'd like to hear more use cases or feedback on integrating with networkX first before we start working on it, can you share some of your ideas, e.g., what to expect for the integration, if possible?
For the arrow integration, any thoughts towards using the C data interface to allow for reading from generic arrow streams and and tables, beyond just using their readers? This would allow richer direct zero-copy integrations (if that works at all in your architecture) with various libraries, similar to DuckDB, and would allow custom integrations.
Hi @eclshunter , thanks for the suggestion. Yes, integrating arrow C data interface is a good idea, and it definitely works with our architecture. Currently, we are using arrow's C data interface for exporting query result to arrow table. And we intend to use arrow table format (through C data interface) for importing from various Python libraries. Hopefully, data import will be our focus in the next two or three months :)
Hi @eclshunter , thanks for the suggestion. Yes, integrating arrow C data interface is a good idea, and it definitely works with our architecture. Currently, we are using arrow's C data interface for exporting query result to arrow table. And we intend to use arrow table format (through C data interface) for importing from various Python libraries. Hopefully, data import will be our focus in the next two or three months :)
Love it, great to hear. That sounds like it'll hit my use cases. Thanks!
What's Kuzu's attitude towards Windows support? Is it something that might be considered for the future?
Hi @mediumrarez , the Windows support is not prioritized yet, but the answer is yes, windows support is in our future roadmap.
I think this is something someone outside of the core Kuzu team can take on. Several people wants this. We attempted this once but there was a problem we couldn't fix (@mewim tried this I think), and someone who knows how to do this could help. @mewim maybe you can summarize what the problem was and someone who knows can either take it on or point to the solution.
Hi @eclshunter , thanks for the suggestion. Yes, integrating arrow C data interface is a good idea, and it definitely works with our architecture. Currently, we are using arrow's C data interface for exporting query result to arrow table. And we intend to use arrow table format (through C data interface) for importing from various Python libraries. Hopefully, data import will be our focus in the next two or three months :)
@ray6080 is reading from arrow is somewhere around? Just found the project and looking into possibility to test it out with zero-copy arrow ingestion.
hi @nixent . thanks for your interest. we're working on reading from arrow right now. still need to wait a bit on this. 😄
@ray6080 I can help with testing when it will be available
@nixent Not sure if you're already on there, but you can hear more about these (or ask such questions) via our Discord. We'll make it a point to engage with the community for the arrow table reader as this is an important feature. See you there!
Edit: As for when it will be available, it's actively being worked on. So hopefully soon :)
I'm closing this because we have long passed Phase 2.