xarray-sql icon indicating copy to clipboard operation
xarray-sql copied to clipboard

Pyarrow integration idea

Open alxmrs opened this issue 1 year ago • 0 comments

Pyarrow has a concept of a Dataset, which is a table partitioned across files that are larger than memory.

https://arrow.apache.org/docs/python/dataset.html

This seems like quite a good fit for this project. I can see two approaches:

  1. Write a function that returns a PyArrow Dataset from an Xarray, where the table is unraveled.
  2. Create a PyArrow FileFormat that reads Zarr (or maybe, Xarray?) that does the conversions to tables automatically (https://arrow.apache.org/docs/python/generated/pyarrow.dataset.FileFormat.html)

alxmrs avatar Feb 22 '24 10:02 alxmrs