datasets
datasets copied to clipboard
Integrate Polars library
Check potential integration of the Polars library: https://github.com/pola-rs/polars
- Benchmark: https://h2oai.github.io/db-benchmark/
CC: @thomwolf @lewtun
If possible, a neat API could be something like Dataset.to_polars()
, as well as Dataset.set_format("polars")
Note they use a "custom" implementation of Arrow: Arrow2.
Polars has grown rapidly in popularity over the last year - could you consider integrating the Polars functionality again?
I don't think the "custom" implementation should be a barrier, it still conforms to the Arrow specification
Is there some direction regarding this from the HF team @lewtun ? Can conversion from polars to HF dataset be implemented with limited/zero copy? So, something like Dataset.from_polars()
and Dataset.to_polars()
like you mentioned. Happy to contribute if I can get some pointers on how this may be implemented.
Hi, is there any updates? Thanks!
Hi, is there any updates? Thanks!
The feature has been there for a bit 😊 You can call dataset.to_polars()
(on a Dataset
, not a DatasetDict
). The issue can be closed, I guess! @lhoestq
Looks great and thanks!
Thank you.