incubator-graphar icon indicating copy to clipboard operation
incubator-graphar copied to clipboard

OSPP24 Idea: Implement ETL CLI Tools for GraphAr

Open acezen opened this issue 10 months ago • 3 comments

Describe the enhancement requested

Description

GraphAr is designed as a unified storage format for graph data, aiming to provide a standardized graph data storage format for easy import/export, as well as exchange and sharing of graph data.Beyond the foundational format design, GraphAr currently also offers libraries in C++, Java, Python, and Scala to enable users to work with GraphAr formatted data across different programming environments.

To facilitate the use of GraphAr formatted data, we aim to provide a command-line tool based on these libraries. This tool will be used for converting data from various sources into GraphAr formatted data and vice versa - transforming GraphAr formatted data into other formats.

This command-line tool needs to support the following features:

  • A user-friendly command-line interface
  • Graph data management: Users can use the CLI tool to view basic information about the GraphAr formatted data, such as the number of nodes, edges, properties, and related Schema information.
  • Graph data import: Users can import data from other formats into GraphAr format through the CLI tool.
  • Support for importing large-scale data: Users can use the CLI tool to import massive datasets into GraphAr format.
  • (Optional) Graph data export: Users can export GraphAr formatted data into other formats using the CLI tool (lower priority).

Deliverables

  1. A CLI tool that meets the above requirements
  2. Detailed design and usage documentation

Component(s)

Other

Reference

acezen avatar Apr 24 '24 08:04 acezen

parquet-cli is a good reference for CLI: https://github.com/apache/parquet-mr/tree/master/parquet-cli

acezen avatar May 21 '24 10:05 acezen

@acezen hi,I would like to ask if you have time to check and reply to my email about ospp,please

I sent it to the [email protected]

ywh555hhh avatar May 27 '24 10:05 ywh555hhh

databend native CLI: https://github.com/datafuselabs/bendsql

acezen avatar Jul 11 '24 06:07 acezen