codon icon indicating copy to clipboard operation
codon copied to clipboard

Feature request: Support Arrow data

Open ghuls opened this issue 10 months ago • 1 comments

It would be nice if codon would support Arrow data in the future, besides numpy arrays.

nanoarrow of the Arrow project should have a relatively easily embeddable implementation:

The nanoarrow libraries are a set of helpers to produce and consume Arrow data, including the Arrow C Data, Arrow C Stream, and Arrow C Device, structures and the serialized Arrow IPC format. The vision of nanoarrow is that it should be trivial for libraries to produce and consume Arrow data: it helps fulfill this vision by providing high-quality, easy-to-adopt helpers to produce, consume, and test Arrow data types and arrays.

The nanoarrow libraries were built to be:

  • Small: nanoarrow’s C runtime compiles into a few hundred kilobytes and its R and Python bindings both have an installed size of ~1 MB.

  • Easy to depend on: nanoarrow’s C library is distributed as two files (nanoarrow.c and nanoarrow.h) and its R and Python bindings have zero dependencies.

  • Useful: The Arrow Columnar Format includes a wide range of data type and data encoding options. To the greatest extent practicable, nanoarrow strives to support the entire Arrow columnar specification (see the Arrow implementation status page for implementation status).

https://arrow.apache.org/nanoarrow/latest/index.html

ghuls avatar Mar 05 '25 13:03 ghuls

Thanks for the suggestion, @ghuls -- we're planning to support this via our Codon-native Pandas which should be coming soon. Should also be straightforward to support Arrow->NumPy directly as well.

arshajii avatar Mar 10 '25 14:03 arshajii

Merging with #608

inumanag avatar Sep 30 '25 23:09 inumanag