altair icon indicating copy to clipboard operation
altair copied to clipboard

Remove PyArrow dependency for Polars support

Open MarcoGorelli opened this issue 7 months ago • 15 comments

What is your suggestion?

Currently, PyArrow is required by Altair for Polars support. I think it shouldn't be too hard to remove it, given that Polars implements the dataframe interchange protocol natively (without depending on PyArrow)

If #3384 can make it in, then Altair would actually support plotting Polars dataframe natively without any extra heavy dependencies. That'd be...pretty amazing? I'd suggest using Altair for polars.DataFrame.plot if that was the case

I think what would need doing is:

  • don't require pyarrow to be installed for the dfi = data.__dataframe__ part
  • instead of using sanitize_arrow_table, for Polars, just select date/datetime columns and call .dt.to_string()
  • instead of using to_pylist from PyArrow, just use DataFrame.rows(named=True) for Polars
  • for categoricals, find a non-pyarrow workaround for Polars in infer_vegalite_type_for_dfi_column. I haven't tried this yet, but it looks straightforward-ish

Would you open to considering this? Happy to work on a PR if so

Have you considered any alternative solutions?

Just keep the status-quo :) But, I think Altair is the only plotting library that gets close to native Polars support without extra large dependencies, and it doesn't look like a large stretch to go all the way there, so I'm hoping we can do it 💪


Demo from having tried this locally:

image

MarcoGorelli avatar Jun 26 '24 10:06 MarcoGorelli