kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Provide useful error message when trying to run specific nodes within namespaces

Open AhdraMeraliQB opened this issue 2 years ago • 1 comments

Description

This PR addresses this issue; currently omitting a namespace when trying to run individual nodes results in an uninformative Value Error:

ValueError: Pipeline does not contain nodes named ['preprocess_companies_node'].

The changes made in this PR add suggestion to the error description, so that when a user tries to run a node without its namespace the error is more helpful:

ValueError: Pipeline does not contain nodes named ['preprocess_companies_node']. Did you mean: ['data_processing.preprocess_companies_node']?

Development notes

This was tested on the spaceflights starter: kedro new --starter=spaceflights

Expected behaviour: From the project directory, running kedro run -n preprocess_companies_node should yield the following error message: "ValueError: Pipeline does not contain nodes named ['preprocess_companies_node']. Did you mean: ['data_processing.preprocess_companies_node']?"

Important to note: The complexity of the namespace search is O(n*m) with n being the number of nodes in a project and m being the number of arguments provided to kedro run -n; for larger pipelines this will slow down. However, this is not the only iteration of the nodes in the pipeline implementation - is the potential slowdown a concern?

Checklist

  • [x] Read the contributing guidelines
  • [x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • [ ] Updated the documentation to reflect the code changes
  • [x] Added a description of this change in the RELEASE.md file
  • [x] Added tests to cover my changes

AhdraMeraliQB avatar Aug 09 '22 13:08 AhdraMeraliQB

This is great! I've wanted this for so so long 💪

datajoely avatar Aug 09 '22 14:08 datajoely

One last thought - perhaps a NameError is more appropriate here

datajoely avatar Aug 11 '22 11:08 datajoely

One last thought - perhaps a NameError is more appropriate here

@datajoely Would this impact consistency? Where kedro run -n <incorrect_node> -n <missing_namespace> would yield a NameError but kedro run -n <incorrect_node> yields a ValueError?

AhdraMeraliQB avatar Aug 11 '22 13:08 AhdraMeraliQB

One last thought - perhaps a NameError is more appropriate here

@datajoely Would this impact consistency? Where kedro run -n <incorrect_node> -n <missing_namespace> would yield a NameError but kedro run -n <incorrect_node> yields a ValueError?

To add to this, we use ValueError across Kedro for exactly this sort of error (e.g. invalid pipeline name). It's very common practice in Python to reserve NameError for actual missing variables and to use ValueError to include the case here that "the value you supplied refers to something that does not exist". So ValueError is definitely more appropriate here.

It's also technically a breaking change to change an exception type, because someone's workflow might rely on catching a certain exception type (very unlikely to be an issue here in practice, but good to think about).

antonymilne avatar Aug 11 '22 15:08 antonymilne

Gotcha it was more of a hairbrained thought than anything more :)

datajoely avatar Aug 11 '22 15:08 datajoely