kedro-plugins
kedro-plugins copied to clipboard
Query endpoint for `SnowparkTableDataset`
Description
SnowparkTableDataset
dataset configuration does not have a query endpoint, so running database-level SQL queries is not possible at the catalog level. Thus users have to make it at the level of the database - at first, execute query to filter data and only after run a Kedro pipeline. Users expect it to work similar to SQLQueryDataset
and GBQQueryDataset
where they have a query endpoint.
https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.snowflake.SnowparkTableDataset.html
We propose to:
- Explore the feasibility of adding a query endpoint in dataset configuration.
- Enhance documentation with tutorials and working examples of how to run
SQL
queries withIbis
in such cases instead: https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines.
Context
- "If I had a query functionality here, then I would have just put that query here and run it from the catalog."
- "I know this query function is available in the SQL and GBQ connection, but not for Snowpark connection back then." https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.pandas.SQLQueryDataset.html https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.pandas.GBQQueryDataset.html