tremor-runtime icon indicating copy to clipboard operation
tremor-runtime copied to clipboard

Add parquet format to codecs

Open mattbailey opened this issue 3 years ago • 7 comments

Describe the problem you are trying to solve

Tremor cannot encode/decode parquet as a codec.

Describe the solution you'd like

Would be nice to have parquet as a supported codec format.

Notes

Official rust implementation of parquet can be found at the apache arrow project: https://github.com/apache/arrow-rs

mattbailey avatar Oct 26 '21 15:10 mattbailey

Oh yeah, lets make that happen!

mfelsche avatar Oct 26 '21 15:10 mfelsche

You might want to look at https://github.com/jorgecarleitao/parquet2 as well. It is a more idiomatic rewrite of parquet.

tobim avatar Mar 23 '22 15:03 tobim

If you also want to process data via IPC (e.g., network, UNIX pipes, shared mmap), then Arrow IPC would offer higher interop.

Arrow itself has the ability to read/write Parquet, which is typically only used as on-disk file format.

mavam avatar Mar 23 '22 16:03 mavam

That's definetly worht looking at too! We generally try to separate the encoding (arrow/parquet) from the transport (UNIX, network, mmap, etc) that way the parts become interchangeable (i.e. we have a UNIX socket, a TCP, and a upd connector, so adding Arrow encoding we'd unlock all those transports at once :D )

Licenser avatar Mar 23 '22 16:03 Licenser

https://docs.rs/arrow/latest/arrow/index.html adding this for keeping

Licenser avatar Mar 24 '22 12:03 Licenser