incubator-graphar
incubator-graphar copied to clipboard
OSPP24 Idea: Implement ETL CLI Tools for GraphAr
Describe the enhancement requested
Description
GraphAr is designed as a unified storage format for graph data, aiming to provide a standardized graph data storage format for easy import/export, as well as exchange and sharing of graph data.Beyond the foundational format design, GraphAr currently also offers libraries in C++, Java, Python, and Scala to enable users to work with GraphAr formatted data across different programming environments.
To facilitate the use of GraphAr formatted data, we aim to provide a command-line tool based on these libraries. This tool will be used for converting data from various sources into GraphAr formatted data and vice versa - transforming GraphAr formatted data into other formats.
This command-line tool needs to support the following features:
- A user-friendly command-line interface
- Graph data management: Users can use the CLI tool to view basic information about the GraphAr formatted data, such as the number of nodes, edges, properties, and related Schema information.
- Graph data import: Users can import data from other formats into GraphAr format through the CLI tool.
- Support for importing large-scale data: Users can use the CLI tool to import massive datasets into GraphAr format.
- (Optional) Graph data export: Users can export GraphAr formatted data into other formats using the CLI tool (lower priority).
Deliverables
- A CLI tool that meets the above requirements
- Detailed design and usage documentation
Component(s)
Other
Reference
parquet-cli is a good reference for CLI: https://github.com/apache/parquet-mr/tree/master/parquet-cli
@acezen hi,I would like to ask if you have time to check and reply to my email about ospp,please
I sent it to the [email protected]
databend native CLI: https://github.com/datafuselabs/bendsql