incubator-graphar icon indicating copy to clipboard operation
incubator-graphar copied to clipboard

[Feat][C++] Add a `WriterOption` to allow user to configure writer option like compression

Open acezen opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. Currently the GraphAr C++ library supports to write chunks in different file formats (CSV, Parquet and ORC) with Arrow's internal file-format supports. Arrow provides writer options for file formats to configure options such as the compression type. But GraphAr only uses the default options to write: CSV: https://github.com/alibaba/GraphAr/blob/ad30121070c9dc115ac916ef620de29e2097af77/src/filesystem.cc#L205-L210 Parquet: https://github.com/alibaba/GraphAr/blob/ad30121070c9dc115ac916ef620de29e2097af77/src/filesystem.cc#L216-L220 ORC: https://github.com/alibaba/GraphAr/blob/ad30121070c9dc115ac916ef620de29e2097af77/src/filesystem.cc#L224-L225

Consider to add a GraphAr WriterOption to allow users to configure the writer option.

Describe the solution you'd like Implement a WriterOption like:

class WriterOption {
   class builder {
          inline builder* compression(CompressionType);
          inline std::shared_ptr<WriterOption> build();
   }
}

and when write chunks with GraphAr, use:

WriterOption::builder builder;
builder.compression(CompressionType::ZSTD);
auto writer_option = builder.build()
auto writer = VertexChunkWriter(vertex_info, prefix, writer_option)

As a first issue, we can only consider to support the compression settings.

Additional context #75

acezen avatar Feb 27 '23 03:02 acezen

cc/ @lixueclaire

acezen avatar Feb 27 '23 03:02 acezen