kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

Polars SQL datasets

Open AntonNikishin opened this issue 1 year ago • 6 comments
trafficstars

Description

It would be great to have Polars implementations of SQLQueryDataset and SQLTableDataset, similar to the Pandas versions: pandas.SQLTableDataset and pandas.SQLQueryDataset.

Context

Sometimes users would like to read / write polars DataFrames directly from SQL databases.

Possible Implementation

The datasets will have similar implementation to Pandas versions, but will use polars built-in functions read_database and write_database.

P.S. I'm happy to work on that ☺️

AntonNikishin avatar Sep 27 '24 12:09 AntonNikishin

Would the ibis dataset already support polar as a backend?

noklam avatar Sep 27 '24 13:09 noklam

Would the ibis dataset already support polar as a backend?

It does, but:

  1. I'm guessing the read_database would need to be implemented in Ibis.
  2. If a user just wants to use Polars syntax in their nodes, I guess it's a fair ask.

It's a separate question whether Polars is the best way to manipulate data in a database (definite downside is pulling it into memory for manipulation, rather than pushing down compute), but a user may still want to do it.

deepyaman avatar Sep 30 '24 10:09 deepyaman

I would recommend to create polars.DatabaseDataset instead of mirroring the pandas datasets, because:

  1. Polars provides symmetrical read and write methods.
  2. SQL is less explicit, because Polars SQL is also a thing.

deepyaman avatar Sep 30 '24 10:09 deepyaman

Make sense, @AntonNikishin is this something you would like to work on?

noklam avatar Nov 11 '24 10:11 noklam

@noklam I am happy to work on this if it is still open.

MinuraPunchihewa avatar Nov 27 '24 17:11 MinuraPunchihewa

@noklam I am happy to work on this if it is still open.

@MinuraPunchihewa Pretty sure this is still up for grabs! I'll assign you.

deepyaman avatar Dec 08 '24 04:12 deepyaman