kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

Query endpoint for `SnowparkTableDataset`

Open ElenaKhaustova opened this issue 8 months ago • 3 comments

Description

SnowparkTableDataset dataset configuration does not have a query endpoint, so running database-level SQL queries is not possible at the catalog level. Thus users have to make it at the level of the database - at first, execute query to filter data and only after run a Kedro pipeline. Users expect it to work similar to SQLQueryDataset and GBQQueryDataset where they have a query endpoint.

https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.snowflake.SnowparkTableDataset.html

We propose to:

  1. Explore the feasibility of adding a query endpoint in dataset configuration.
  2. Enhance documentation with tutorials and working examples of how to run SQL queries with Ibis in such cases instead: https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines.

Context

  • "If I had a query functionality here, then I would have just put that query here and run it from the catalog."

Screenshot 2024-06-06 at 15 00 21

  • "I know this query function is available in the SQL and GBQ connection, but not for Snowpark connection back then." https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.pandas.SQLQueryDataset.html https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.pandas.GBQQueryDataset.html

ElenaKhaustova avatar Jun 06 '24 14:06 ElenaKhaustova