datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Integrate Polars library

Open albertvillanova opened this issue 3 years ago • 5 comments

Check potential integration of the Polars library: https://github.com/pola-rs/polars

  • Benchmark: https://h2oai.github.io/db-benchmark/

CC: @thomwolf @lewtun

albertvillanova avatar Nov 29 '21 12:11 albertvillanova

If possible, a neat API could be something like Dataset.to_polars(), as well as Dataset.set_format("polars")

lewtun avatar Nov 29 '21 12:11 lewtun

Note they use a "custom" implementation of Arrow: Arrow2.

albertvillanova avatar Nov 29 '21 13:11 albertvillanova

Polars has grown rapidly in popularity over the last year - could you consider integrating the Polars functionality again?

I don't think the "custom" implementation should be a barrier, it still conforms to the Arrow specification

braaannigan avatar Nov 01 '22 15:11 braaannigan

Is there some direction regarding this from the HF team @lewtun ? Can conversion from polars to HF dataset be implemented with limited/zero copy? So, something like Dataset.from_polars() and Dataset.to_polars() like you mentioned. Happy to contribute if I can get some pointers on how this may be implemented.

amrit110 avatar Sep 12 '23 14:09 amrit110

Hi, is there any updates? Thanks!

fzyzcjy avatar Mar 16 '24 01:03 fzyzcjy

Hi, is there any updates? Thanks!

The feature has been there for a bit 😊 You can call dataset.to_polars() (on a Dataset, not a DatasetDict). The issue can be closed, I guess! @lhoestq

baggiponte avatar Aug 30 '24 20:08 baggiponte

Looks great and thanks!

fzyzcjy avatar Aug 30 '24 23:08 fzyzcjy

Thank you.

albertvillanova avatar Aug 31 '24 05:08 albertvillanova