kuzu icon indicating copy to clipboard operation
kuzu copied to clipboard

Phase 2 Roadmap

Open semihsalihoglu-uw opened this issue 3 years ago • 12 comments
trafficstars

Phase 2 Roadmap

Note: This only documents important features/changes that we aim to deliver in the next phase, and it will be refreshed and updated as we move on.

APIs, Data Ingestion and Exports

  • [x] Integrate Arrow. Replace our CSV Reader with Arrow reading. Use the arrow reader to support copying from Parquet, Arrow, and CSV files.
  • [ ] Add Restful API.
  • [x] Add API extension to convert to a graph format that we can feed to javascript graph viz.
  • [x] Add API Extension to PytorchG
  • [ ] Add API Extension to DGL
  • [x] Add data export support, e.g., from Cypher CSV Export COPY tbl TO 'output.csv' (HEADER, DELIMITER ',');

Storage and Transactions

  • [x] Compete full transaction support and Data Manipulation Clauses:
    • [x] REMOVE, DELETE, of edges and nodes with edges
    • [x] SET of edge properties
  • [ ] Checkpointing Optimizations (https://github.com/graphflowdb/graphflowdb/issues/771)
  • [x] Get rid of ad-hoc/unstructured properties for now.
  • [ ] Optimizations on lists data structure (https://github.com/graphflowdb/graphflowdb/issues/730).
  • [x] Add system versioning information (including storage versioning and compatibility check).
  • [x] Improved single buffer pool.

Front-end and Query Processor

  • [x] Support UTF-8.
  • [x] Full implementation of worst-case optimal joins.
  • [x] Full implementation of ASP joins.
  • [ ] String and Overflow File Enhancements: https://github.com/graphflowdb/graphflowdb/issues/980
  • [ ] Optimization on Hash Join, e.g., Parallel Finalization of Hash Table
  • [ ] Degree(a) function to compute node degrees (https://github.com/graphflowdb/graphflowdb/issues/554)
  • [x] Unlabeled Node and Rel Queries (QP)
  • [x] Undirected Edges (Storage, QP)
  • [x] Shortest Paths (Frontend, QP)
  • [x] Regular Expressions in string predicates (Frontend, QP)
  • [x] Using paths/types in predicates (Frontend, QP)
  • [x] CASE ELSE END

Refactoring

  • [ ] Get rid of flat/unflat distinction.
  • [ ] Un-template disk array, and move storage structures to use disk array.
  • [x] De-couple planner and optimizer.

Usability

  • [ ] Optional logging.

Other

  • [x] ALTER TABLE
  • [x] UDF Support
  • [x] PROFILE fix

semihsalihoglu-uw avatar Nov 12 '22 21:11 semihsalihoglu-uw

Great project. Is there a ballpark estimate on when there'll be an integration with networkX? (if ever?) Thanks! Raul

raulcf avatar Dec 08 '22 20:12 raulcf

Hi @raulcf , thanks for your interest!

Currently, we are working on the integration with Pandas, through which Kuzu can be used with networkX, but we haven't prioritized the direct integration with networkX yet. We'd like to hear more use cases or feedback on integrating with networkX first before we start working on it, can you share some of your ideas, e.g., what to expect for the integration, if possible?

ray6080 avatar Dec 08 '22 21:12 ray6080

For the arrow integration, any thoughts towards using the C data interface to allow for reading from generic arrow streams and and tables, beyond just using their readers? This would allow richer direct zero-copy integrations (if that works at all in your architecture) with various libraries, similar to DuckDB, and would allow custom integrations.

eclshunter avatar Apr 04 '23 07:04 eclshunter

Hi @eclshunter , thanks for the suggestion. Yes, integrating arrow C data interface is a good idea, and it definitely works with our architecture. Currently, we are using arrow's C data interface for exporting query result to arrow table. And we intend to use arrow table format (through C data interface) for importing from various Python libraries. Hopefully, data import will be our focus in the next two or three months :)

ray6080 avatar Apr 04 '23 16:04 ray6080

Hi @eclshunter , thanks for the suggestion. Yes, integrating arrow C data interface is a good idea, and it definitely works with our architecture. Currently, we are using arrow's C data interface for exporting query result to arrow table. And we intend to use arrow table format (through C data interface) for importing from various Python libraries. Hopefully, data import will be our focus in the next two or three months :)

Love it, great to hear. That sounds like it'll hit my use cases. Thanks!

eclshunter avatar Apr 04 '23 16:04 eclshunter

What's Kuzu's attitude towards Windows support? Is it something that might be considered for the future?

midrare avatar Apr 10 '23 08:04 midrare

Hi @mediumrarez , the Windows support is not prioritized yet, but the answer is yes, windows support is in our future roadmap.

ray6080 avatar Apr 10 '23 13:04 ray6080

I think this is something someone outside of the core Kuzu team can take on. Several people wants this. We attempted this once but there was a problem we couldn't fix (@mewim tried this I think), and someone who knows how to do this could help. @mewim maybe you can summarize what the problem was and someone who knows can either take it on or point to the solution.

semihsalihoglu-uw avatar Apr 10 '23 16:04 semihsalihoglu-uw

Hi @eclshunter , thanks for the suggestion. Yes, integrating arrow C data interface is a good idea, and it definitely works with our architecture. Currently, we are using arrow's C data interface for exporting query result to arrow table. And we intend to use arrow table format (through C data interface) for importing from various Python libraries. Hopefully, data import will be our focus in the next two or three months :)

@ray6080 is reading from arrow is somewhere around? Just found the project and looking into possibility to test it out with zero-copy arrow ingestion.

nixent avatar Mar 11 '24 13:03 nixent

hi @nixent . thanks for your interest. we're working on reading from arrow right now. still need to wait a bit on this. 😄

ray6080 avatar Mar 11 '24 15:03 ray6080

@ray6080 I can help with testing when it will be available

nixent avatar Mar 11 '24 20:03 nixent

@nixent Not sure if you're already on there, but you can hear more about these (or ask such questions) via our Discord. We'll make it a point to engage with the community for the arrow table reader as this is an important feature. See you there!

Edit: As for when it will be available, it's actively being worked on. So hopefully soon :)

prrao87 avatar Mar 11 '24 20:03 prrao87

I'm closing this because we have long passed Phase 2.

andyfengHKU avatar Jun 07 '24 19:06 andyfengHKU