kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

feat(datasets): Added the Experimental PolarsDatabaseDataset

Open MinuraPunchihewa opened this issue 10 months ago • 3 comments

Description

This PR adds the PolarsDatabaseDataset to support interactions with databases using Polars.

Fixes https://github.com/kedro-org/kedro-plugins/issues/853

Development notes

Quite a bit of code has been copied over from SQLQueryDataset in this implementation.

These changes have been tested,

  1. Manually, by running the code locally to load and save Polars DataFrames from and to SQLite files. ~~2. Via the existing and newly added unit tests.~~

Checklist

  • [X] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • [ ] Updated the documentation to reflect the code changes
  • [ ] Added a description of this change in the relevant RELEASE.md file
  • [ ] Added tests to cover my changes
  • [ ] Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

MinuraPunchihewa avatar Jan 14 '25 18:01 MinuraPunchihewa

Hey @noklam, @deepyaman, I was able to come up with this implementation for the PolarsDatabaseDataset by extending SQLQueryDataset and it seems to work quite well (at least load() does).

Should we implement save() as well? This would require a table name to be provided as parameter.

Or do you have different thoughts on how this dataset ought to be implemented?

MinuraPunchihewa avatar Jan 14 '25 18:01 MinuraPunchihewa

Hey @noklam, @deepyaman, I was able to come up with this implementation for the PolarsDatabaseDataset by extending SQLQueryDataset and it seems to work quite well (at least load() does).

Sorry for the late response; didn't see this.

While there may be opportunities to reduce code copying more broadly, most datasets just inherit from AbstractDataset or AbstractVersionedDataset. Here, inheriting from pandas.SQLQueryDataset adds a pandas dependency, so we shouldn't do that.

Should we implement save() as well? This would require a table name to be provided as parameter.

Probably, because Polars supports it. You can make the table name optional but require it for save.

deepyaman avatar Jan 27 '25 17:01 deepyaman

Hey @noklam and @deepyaman, I've refactored this by removing the dependency on pandas.SQLQueryDataset as @deepyaman suggested. I copied over quite a bit of code from there though since these work quite similarly. I think the duplication can definitely be removed to a large extent by moving some of the common functions to _utils. Should I handle this in a different PR since pandas.SQLQueryDataset is already a fixed (non-experimental) dataset? Let me know what you think and I will add the finishing touches to this PR (tests and so forth).

MinuraPunchihewa avatar Feb 01 '25 17:02 MinuraPunchihewa

Please add to https://github.com/MinuraPunchihewa/kedro-plugins/blob/feature/polars_database_dataset/kedro-datasets/docs/pages/api/kedro_datasets_experimental/index.md so the documentation appears in https://kedro--990.org.readthedocs.build/projects/kedro-datasets/en/990/pages/api/kedro_datasets/ 🙏🏼

Done!

MinuraPunchihewa avatar May 28 '25 04:05 MinuraPunchihewa

Now https://kedro--990.org.readthedocs.build/projects/kedro-datasets/en/990/pages/api/kedro_datasets_experimental/polars.PolarsDatabaseDataset.md gives HTTP 404 though 👀

astrojuanlu avatar May 28 '25 06:05 astrojuanlu

Now https://kedro--990.org.readthedocs.build/projects/kedro-datasets/en/990/pages/api/kedro_datasets_experimental/polars.PolarsDatabaseDataset.md gives HTTP 404 though 👀

Oh, apologies. I thought I had already added the docs, but I haven't. I will do it soon.

MinuraPunchihewa avatar May 28 '25 06:05 MinuraPunchihewa

Hi @MinuraPunchihewa, just following up - do you have a plan to complete that update?

DimedS avatar Jul 01 '25 14:07 DimedS

I'm interested in this dataset 😄 will try to bring it back to life

astrojuanlu avatar Jul 02 '25 13:07 astrojuanlu

This is now working https://kedro--990.org.readthedocs.build/projects/kedro-datasets/en/990/api/kedro_datasets_experimental/polars.PolarsDatabaseDataset/

Ready for a round of code reviews ✔️

astrojuanlu avatar Jul 02 '25 13:07 astrojuanlu

Apologies, guys. I could not find the time earlier to finish this up. @astrojuanlu Thank you very much for your help.

MinuraPunchihewa avatar Jul 11 '25 15:07 MinuraPunchihewa

I'll address @ElenaKhaustova 's comment ☝🏼 but how do I fix the detect-secrets thing?

astrojuanlu avatar Jul 16 '25 15:07 astrojuanlu

I'll address @ElenaKhaustova 's comment ☝🏼 but how do I fix the detect-secrets thing?

I've addressed the comment and fixed the detect-secrets check as well.

merelcht avatar Jul 29 '25 15:07 merelcht