xarray-sql
xarray-sql copied to clipboard
An experiment to query Xarray datasets with SQL
https://en.wikipedia.org/wiki/Online_analytical_processing?wprov=sfti1#Multidimensional_OLAP_(MOLAP) https://en.wikipedia.org/wiki/OLAP_cube It seems like this is probably a wide body of research to learn from that could make this project more effective. Best not fail to learn from mistakes...
Figure out a way to distribute all layers of SQL execution #10 on Apache Beam.
Idea for creation: subclass the python executor. https://github.com/tobymao/sqlglot/blob/main/sqlglot/executor/python.py Implement an alternative way to scan through a Table that has an Xarray dataset in it.
Let's demo this package with an automatic weather historian. Brief summary of how: Embed a SQL-to-text LLM in the browser (or, likely an API). Let users ask questions that can...
How fast could we calculate the min, max, avg temperature for every atmospheric level of ARCO-ERA5? https://www.morling.dev/blog/one-billion-row-challenge/ https://medium.com/coiled-hq/one-trillion-row-challenge-5bfd4c3b8aef
It may be possible to use Xarray as a backend for an Iceberg table. https://py.iceberg.apache.org/api/ This would be similar to #20 and another way of approximating #4.
It looks like this idea exists elsewhere in this ecosystem! Iris's Cube can convert to a Pandas Dataframe: https://github.com/SciTools/iris/pull/5074
By subclassing ddf, we could provide a few key features to make Qarray better: - Centralized Metadata, like df shape / size (#18) - MultiIndex indexes, like pandas. These would...
Pyarrow has a concept of a Dataset, which is a table partitioned across files that are larger than memory. https://arrow.apache.org/docs/python/dataset.html This seems like quite a good fit for this project....
https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html Can we write a Zarr parser in Rust that reads chunks, unravels them, and packages them in partitions as a ListingTable? This could be one way to implement #4.