Polars Support
It would be great to offer Polars support, it is currently half as popular as Pandas, and generally work better for large datasets. Polars is bound to replace most data-scientist day to day operations within the next five years.
Thanks for developing bigframes, it very useful.
What kind of polars support would you find useful? Would you want BigQuery DataFrames to have an polars-like DataFrame API (as an alternative to the current pandas-like one) or simply interop with polars objects more easily?
I would like automatic schema supply, this is currently the limiting step in automatically uploading Polars DataFrames: write_ndjson seems to be the only way I can upload list dtypes (Parquet seems to not be viable, see this issue), but NDJSON requires the schema to be passed. I'm really looking for something that will just let me put my Polars DataFrame in a BQ table without fiddling with schemas: there should be enough info already here to do that for me.
For going from BigQuery DataFrames to polars, I'm adding a to_arrow method in https://github.com/googleapis/python-bigquery-dataframes/pull/807 as well as an example for how to create a polars DataFrame from the results.
For uploading to BigQuery, I have updated the polars docs to indicate how to get BigQuery to correctly handle list types https://github.com/pola-rs/polars/pull/20292
I think that read_polars and to_polars methods would be reasonable requests for bigframes. I have done some refactoring recently to our I/O that might make it a bit easier, but would probably require a little more refactoring to have pyarrow tables/recordbatches as the intermediate format instead of pandas dataframes. The other thing to be careful about is that polars would be an optional "extra" dependency in setup.py to avoid a hard dependency on the polars package.
Edit: Or at the very least, a read_arrow(...) method to correspond to the to_arrow() I implemented in #807. There are fewer concerns with depending on pyarrow in bigframes because we already have that as a required dependency.
Amazing! Should this issue be closed now?
I just mailed https://github.com/googleapis/python-bigquery-dataframes/pull/1855 with bpd.read_arrow(pyarrow.Table) to round out the other side of this conversion.
Technically I think this was possible before by going through the DataFrame constructor, but that ended up translating to pandas as an in-between layer. Now we can just go from polars -> Arrow -> BigFrames without pandas in the middle.