kedro-mlflow
kedro-mlflow copied to clipboard
Handling of Exceptions in MLPipeline
Description
Errors in the MLPipeline are overshadowed by a NotImplementedError Exception, which makes debugging more complex than necessary
Context
This bug occurs only if there is an Exception in the MLPipeline.training pipeline. It is not critical as the relevant Error message is still shown above
Steps to Reproduce
If required I can prepare a better example, but this should actually be enough to reproduce the issue.
- Add
raise ValueError("My debug message")
to any node which is part of an MLPipeline (training) using kedro > 0.11
Expected Result
I expect a ValueError to be raised with "My debug message". In addition kedro provides a resume from nodes preview functionality. And this is actually cause of the issue.
Actual Result
During handling of the above exception, another exception occurred
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ kedro:8 in <module> │
│ │
│ 5 from kedro.framework.cli import main │
│ 6 if __name__ == '__main__': │
│ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ python3.9/site-packages/kedro/framework/cli/cli. │
│ py:211 in main │
│ │
│ 208 │ """ │
│ 209 │ _init_plugins() │
│ 210 │ cli_collection = KedroCLI(project_path=Path.cwd()) │
│ ❱ 211 │ cli_collection() │
│ 212 │
│ │
│ python3.9/site-packages/click/core.py:1130 in │
│ __call__ │
│ │
│ python3.9/site-packages/kedro/framework/cli/cli. │
│ py:139 in main │
│ │
│ 136 │ │ ) │
│ 137 │ │ │
│ 138 │ │ try: │
│ ❱ 139 │ │ │ super().main( │
│ 140 │ │ │ │ args=args, │
│ 141 │ │ │ │ prog_name=prog_name, │
│ 142 │ │ │ │ complete_var=complete_var, │
│ │
│ python3.9/site-packages/click/core.py:1055 in │
│ main │
│ │
│ python3.9/site-packages/click/core.py:1657 in │
│ invoke │
│ │
│ python3.9/site-packages/click/core.py:1404 in │
│ invoke │
│ │
│ python3.9/site-packages/click/core.py:760 in │
│ invoke │
│ │
│ python3.9/site-packages/kedro/framework/cli/proj │
│ ect.py:366 in run │
│ │
│ 363 │ node_names = _get_values_as_tuple(node_names) if node_names else node_names │
│ 364 │ │
│ 365 │ with KedroSession.create(env=env, extra_params=params) as session: │
│ ❱ 366 │ │ session.run( │
│ 367 │ │ │ tags=tag, │
│ 368 │ │ │ runner=runner(is_async=is_async), │
│ 369 │ │ │ node_names=node_names, │
│ │
│ python3.9/site-packages/kedro/framework/session/ │
│ session.py:407 in run │
│ │
│ 404 │ │ ) │
│ 405 │ │ │
│ 406 │ │ try: │
│ ❱ 407 │ │ │ run_result = runner.run( │
│ 408 │ │ │ │ filtered_pipeline, catalog, hook_manager, session_id │
│ 409 │ │ │ ) │
│ 410 │ │ │ self._run_called = True │
│ │
│ python3.9/site-packages/kedro/runner/runner.py:8 │
│ 8 in run │
│ │
│ 85 │ │ │ self._logger.info( │
│ 86 │ │ │ │ "Asynchronous mode is enabled for loading and saving data" │
│ 87 │ │ │ ) │
│ ❱ 88 │ │ self._run(pipeline, catalog, hook_manager, session_id) │
│ 89 │ │ │
│ 90 │ │ self._logger.info("Pipeline execution completed successfully.") │
│ 91 │
│ │
│ python3.9/site-packages/kedro/runner/sequential_ │
│ runner.py:73 in _run │
│ │
│ 70 │ │ │ │ run_node(node, catalog, hook_manager, self._is_async, session_id) │
│ 71 │ │ │ │ done_nodes.add(node) │
│ 72 │ │ │ except Exception: │
│ ❱ 73 │ │ │ │ self._suggest_resume_scenario(pipeline, done_nodes, catalog) │
│ 74 │ │ │ │ raise │
│ 75 │ │ │ │
│ 76 │ │ │ # decrement load counts and release any data sets we've finished with │
│ │
│ python3.9/site-packages/kedro/runner/runner.py:1 │
│ 86 in _suggest_resume_scenario │
│ │
│ 183 │ │ postfix = "" │
│ 184 │ │ if done_nodes: │
│ 185 │ │ │ node_names = (n.name for n in remaining_nodes) │
│ ❱ 186 │ │ │ resume_p = pipeline.only_nodes(*node_names) │
│ 187 │ │ │ start_p = resume_p.only_nodes_with_inputs(*resume_p.inputs()) │
│ 188 │ │ │ │
│ 189 │ │ │ # find the nearest persistent ancestors of the nodes in start_p │
│ │
│ python3.9/site-packages/kedro_mlflow/pipeline/pi │
│ peline_ml.py:173 in only_nodes │
│ │
│ 170 │ │ ) │
│ 171 │ │
│ 172 │ def only_nodes(self, *node_names: str) -> "Pipeline": # pragma: no cover │
│ ❱ 173 │ │ raise NotImplementedError(MSG_NOT_IMPLEMENTED) │
│ 174 │ │
│ 175 │ def only_nodes_with_namespace( │
│ 176 │ │ self, node_namespace: str │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: This method is not implemented because it does not make sense for 'PipelineML'. Manipulate directly the training pipeline and recreate the 'PipelineML' with 'pipeline_ml_factory' factory.
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
-
kedro
andkedro-mlflow
version used (pip show kedro
andpip show kedro-mlflow
): 0.18.3 and 0.11.4 - Python version used (
python -V
): 3.9 - Operating system and version: macOS 12.5.1
Does the bug also happen with the last version on master?
Yes, tried it out
Thank you already for your support. My suggestion would be to just call only_nodes
on the training pipeline of the MLPipeline.
Hi @daniel-ressi, you're not the first one to notice this behaviour. Unfortunately, kedro filters your pipeline to suggest a resume scenario, and this breaks PipelineML
object. This is the correct behaviour: you should not use the suggested command because it will not work with PipelineML
which assumes you are running the entire pipeline and not part of it.
However, given how annoying this stacktrace is, I am considering changing the behaviour and only issuing a warning. The risk is that some people will run their entire pipeline before noticing PipelineML
object does not work as intended.
I will try to find a way to not hinder the entire stacktrace, but I have no straighforward solution for now, sorry.
thanks for you swift response. Is the issue that kedro's resume scenario would relate to to running only the training pipeline and not the PipelineML
? I would upvote a solution that just warns the user about these implications.
I guess ideally it would be possible to disable the resume scenario suggestion for a PipelineML run, but this seems not possible as it's not called through a hook butwith the Runner.
Eitherway great work @Galileo-Galilei !