kedro-plugins
kedro-plugins copied to clipboard
kedro-airflow: Extend grouping strategies
Description
https://github.com/kedro-org/kedro/issues/3094 lists a number of pain points experienced by users while deploying their Kedro projects to MLOps platforms. Each kedro node is assigned to a task 1:1.
#241 added the --group-by-memory
flag to make it possible to group nodes that share MemoryDataset
s between them into one airflow task.
This ticket is to propose extending the grouping strategies offered by kedro-airflow
There's some strategies we can consider -
- by pipeline
- by tags (https://getindata.com/blog/deploying-kedro-pipelines-gcp-composer-airflow-node-grouping-mlflow/ written by @Lasica)
- by namespace(?)
Suggestion
- Change the design of
--group-by-memory
to something like--grouping-stratergy=<nodes/pipeline/memory>
/--group-by=<>
to take input. This will make it easy for us to add grouping strategies in the future depending on what users actually want/need. - Gather user input on what grouping strategies would be useful