kedro
kedro copied to clipboard
Investigate further inconsistencies between the CLI and Kedro API
Description
#2014 Explored inconsistencies within the CLI itself, but it stands that the CLI is tightly coupled with the API. This ticket looks at investigating any inconsistencies in this wider context with the aim of streamlining our user experience. For example:
- Parameters in KedroSession.run() vs CLI
kedro run
- Parameter names in
config.yml
supplied tokedro run
vs CLIkedro run
flags
This list is far from exhaustive, but serves to highlight areas of consideration when conducting this investigation.
Potentially useful: https://clig.dev/
Copied from https://github.com/kedro-org/kedro/pull/2301#discussion_r1104266236:
Fine to keep
node_names
for now in #2245 but as part of https://github.com/kedro-org/kedro/issues/2247 I think we should consider just calling thisnodes
. (Just while I remember, Ivan's argument for why we might not want to do this: it's deliberately written asnode_names
to deter people from trying to pass node objects themselves to the underlyingsession.run
. My argument for it is consistency and simplicity, particularly when we eventually return to https://github.com/kedro-org/kedro/pull/1423)
From discussion with @jmholzer and @noklam in https://github.com/kedro-org/kedro/pull/2306: there's inconsistent ordering between our click definition of run
and the Python KedroSession.run
API. This isn't a huge problem but would be nice to have them the same.
Proposed solution:
- Decide what the correct order of arguments is in click definition of
run
(which determines order flags appear inkedro run --help
). Maybe this is already correct - Re-order arguments in
KedroSession.run
to match - Make
KedroSession.run
arguments keyword-only (i.e.def run (self, *, ...)
) so we can alter the order or insert new arguments wherever we see fit in the future without it being a breaking change
I doubt anyone calls KedroSession.run
without using keywords at the moment, but points 2 and 3 above are breaking changes so should only be done in 0.19. It would make sense to do any renaming of arguments at the same time as this in 0.19 anyway.
Probably the same applies to pipeline.filter
and that should become keyword-only and match also.
wdyt @jmholzer @noklam?
I like the keyword-only idea and it should make things more flexible, if we shuffle the order in the future it won't be a breaking change anymore, is it correct?
On 1. we may leverage the HEAP statistic to arrange them according to usage, but the same group of arguments shoul go together, i.e. from-inputs
, to-outputs
. I expect the top arguments would be --pipeline
, --env
, --params
,--tags
I agree that 2/3 should be done in 0.19, I don't know if people are using KedroSession.run
without the keywords, it's still possible since the most common argument pipeline_name
is the first argument. I would stay on the safe side and do it in 0.19
I like the keyword-only idea and it should make things more flexible, if we shuffle the order in the future it won't be a breaking change anymore, is it correct?
Exactly. We should do the same for kedro-datasets at some point, since those are called (almost?) always with keyword arguments anyway, and it means we can reorder arguments and add new ones it however we like in the future. At the moment they're a bit of a mess.
On 1. we may leverage the HEAP statistic to arrange them according to usage, but the same group of arguments shoul go together, i.e.
from-inputs
,to-outputs
. I expect the top arguments would be--pipeline
, --env,
--params,
--tags`
Looking at the HEAP statistics is a great idea here although maybe there's the question about correlation vs. causation: what if some arguments are already used more frequently more frequently because of the position they appear in kedro run --help
🤔 I doubt that's really the case though, so fully agree about re-ordering them according to how popular they are and grouping together similar arguments.
I agree that 2/3 should be done in 0.19, I don't know if people are using
KedroSession.run
without the keywords, it's still possible since the most common argumentpipeline_name
is the first argument. I would stay on the safe side and do it in 0.19
👍