kedro icon indicating copy to clipboard operation
kedro copied to clipboard

[DataCatalog]: Provide public methods to modify catalog

Open ElenaKhaustova opened this issue 8 months ago • 1 comments

Description

Plugin developers and advanced users face limitations due to the absence of public methods for modifying the catalog datasets, and injecting dynamic behaviour or configuration parameters on the fly during pipeline execution. Although these limitations are made intentionally by not providing corresponding public APIs users bypass them by using private APIs.

We propose to:

  1. Rethink the concept of keeping DataCatalog immutable.
  2. Explore the feasibility of providing public API for modifying the catalog datasets and configuration parameters, enabling users to adapt the pipeline's behaviour in response to changing runtime requirements or environmental conditions.

Relates to https://github.com/kedro-org/kedro/issues/2728

Context

  • Users need the ability to view and modify information within the Data Catalog dynamically during pipeline execution. This includes injecting dynamic data or swapping dataset implementations to accommodate varying runtime requirements.

https://github.com/Galileo-Galilei/kedro-mlflow/blob/64b8e94e1dafa02d979e7753dab9b9dfd4d7341c/kedro_mlflow/framework/hooks/mlflow_hook.py#L145

Screenshot 2024-06-05 at 17 58 19

  • Plugin developers are interested in checking the dataset's type and injecting dynamic behaviour based on that type. They want to determine whether a dataset belongs to a certain class or type and then modify its parameters or behaviour accordingly, such as configuring it based on their environment or integration needs.

https://github.com/getindata/kedro-azureml/blob/d5c2011c7ed7fdc03235bf2bd6701f1901d1139c/kedro_azureml/hooks.py#L20

Screenshot 2024-06-05 at 17 37 57

ElenaKhaustova avatar Jun 05 '24 16:06 ElenaKhaustova