kedro-plugins
kedro-plugins copied to clipboard
Polars SQL datasets
Description
It would be great to have Polars implementations of SQLQueryDataset and SQLTableDataset, similar to the Pandas versions: pandas.SQLTableDataset and pandas.SQLQueryDataset.
Context
Sometimes users would like to read / write polars DataFrames directly from SQL databases.
Possible Implementation
The datasets will have similar implementation to Pandas versions, but will use polars built-in functions read_database and write_database.
P.S. I'm happy to work on that ☺️
Would the ibis dataset already support polar as a backend?
Would the ibis dataset already support polar as a backend?
It does, but:
- I'm guessing the
read_databasewould need to be implemented in Ibis. - If a user just wants to use Polars syntax in their nodes, I guess it's a fair ask.
It's a separate question whether Polars is the best way to manipulate data in a database (definite downside is pulling it into memory for manipulation, rather than pushing down compute), but a user may still want to do it.
I would recommend to create polars.DatabaseDataset instead of mirroring the pandas datasets, because:
- Polars provides symmetrical read and write methods.
- SQL is less explicit, because Polars SQL is also a thing.
Make sense, @AntonNikishin is this something you would like to work on?
@noklam I am happy to work on this if it is still open.
@noklam I am happy to work on this if it is still open.
@MinuraPunchihewa Pretty sure this is still up for grabs! I'll assign you.