kedro
kedro copied to clipboard
Provide useful error message when trying to run specific nodes within namespaces
Description
This PR addresses this issue; currently omitting a namespace when trying to run individual nodes results in an uninformative Value Error:
ValueError: Pipeline does not contain nodes named ['preprocess_companies_node'].
The changes made in this PR add suggestion to the error description, so that when a user tries to run a node without its namespace the error is more helpful:
ValueError: Pipeline does not contain nodes named ['preprocess_companies_node']. Did you mean: ['data_processing.preprocess_companies_node']?
Development notes
This was tested on the spaceflights starter:
kedro new --starter=spaceflights
Expected behaviour:
From the project directory, running kedro run -n preprocess_companies_node
should yield the following error message:
"ValueError: Pipeline does not contain nodes named ['preprocess_companies_node']. Did you mean: ['data_processing.preprocess_companies_node']?"
Important to note:
The complexity of the namespace search is O(n*m) with n being the number of nodes in a project and m being the number of arguments provided to kedro run -n
; for larger pipelines this will slow down. However, this is not the only iteration of the nodes in the pipeline implementation - is the potential slowdown a concern?
Checklist
- [x] Read the contributing guidelines
- [x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [ ] Updated the documentation to reflect the code changes
- [x] Added a description of this change in the
RELEASE.md
file - [x] Added tests to cover my changes
This is great! I've wanted this for so so long 💪
One last thought - perhaps a NameError is more appropriate here
One last thought - perhaps a NameError is more appropriate here
@datajoely Would this impact consistency? Where kedro run -n <incorrect_node> -n <missing_namespace>
would yield a NameError but kedro run -n <incorrect_node>
yields a ValueError?
One last thought - perhaps a NameError is more appropriate here
@datajoely Would this impact consistency? Where
kedro run -n <incorrect_node> -n <missing_namespace>
would yield a NameError butkedro run -n <incorrect_node>
yields a ValueError?
To add to this, we use ValueError
across Kedro for exactly this sort of error (e.g. invalid pipeline name). It's very common practice in Python to reserve NameError
for actual missing variables and to use ValueError
to include the case here that "the value you supplied refers to something that does not exist". So ValueError
is definitely more appropriate here.
It's also technically a breaking change to change an exception type, because someone's workflow might rely on catching a certain exception type (very unlikely to be an issue here in practice, but good to think about).
Gotcha it was more of a hairbrained thought than anything more :)