kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Investigate further inconsistencies between the CLI and Kedro API

Open AhdraMeraliQB opened this issue 2 years ago • 5 comments

Description

#2014 Explored inconsistencies within the CLI itself, but it stands that the CLI is tightly coupled with the API. This ticket looks at investigating any inconsistencies in this wider context with the aim of streamlining our user experience. For example:

  • Parameters in KedroSession.run() vs CLI kedro run
  • Parameter names in config.yml supplied to kedro run vs CLI kedro run flags

This list is far from exhaustive, but serves to highlight areas of consideration when conducting this investigation.

AhdraMeraliQB avatar Jan 23 '23 17:01 AhdraMeraliQB

Potentially useful: https://clig.dev/

astrojuanlu avatar Feb 06 '23 14:02 astrojuanlu

Copied from https://github.com/kedro-org/kedro/pull/2301#discussion_r1104266236:

Fine to keep node_names for now in #2245 but as part of https://github.com/kedro-org/kedro/issues/2247 I think we should consider just calling this nodes. (Just while I remember, Ivan's argument for why we might not want to do this: it's deliberately written as node_names to deter people from trying to pass node objects themselves to the underlying session.run. My argument for it is consistency and simplicity, particularly when we eventually return to https://github.com/kedro-org/kedro/pull/1423)

antonymilne avatar Feb 14 '23 06:02 antonymilne

From discussion with @jmholzer and @noklam in https://github.com/kedro-org/kedro/pull/2306: there's inconsistent ordering between our click definition of run and the Python KedroSession.run API. This isn't a huge problem but would be nice to have them the same.

Proposed solution:

  1. Decide what the correct order of arguments is in click definition of run (which determines order flags appear in kedro run --help). Maybe this is already correct
  2. Re-order arguments in KedroSession.run to match
  3. Make KedroSession.run arguments keyword-only (i.e. def run (self, *, ...)) so we can alter the order or insert new arguments wherever we see fit in the future without it being a breaking change

I doubt anyone calls KedroSession.run without using keywords at the moment, but points 2 and 3 above are breaking changes so should only be done in 0.19. It would make sense to do any renaming of arguments at the same time as this in 0.19 anyway.

Probably the same applies to pipeline.filter and that should become keyword-only and match also.

wdyt @jmholzer @noklam?

antonymilne avatar Feb 14 '23 06:02 antonymilne

I like the keyword-only idea and it should make things more flexible, if we shuffle the order in the future it won't be a breaking change anymore, is it correct?

On 1. we may leverage the HEAP statistic to arrange them according to usage, but the same group of arguments shoul go together, i.e. from-inputs, to-outputs. I expect the top arguments would be --pipeline, --env, --params,--tags

I agree that 2/3 should be done in 0.19, I don't know if people are using KedroSession.run without the keywords, it's still possible since the most common argument pipeline_name is the first argument. I would stay on the safe side and do it in 0.19

noklam avatar Feb 16 '23 07:02 noklam

I like the keyword-only idea and it should make things more flexible, if we shuffle the order in the future it won't be a breaking change anymore, is it correct?

Exactly. We should do the same for kedro-datasets at some point, since those are called (almost?) always with keyword arguments anyway, and it means we can reorder arguments and add new ones it however we like in the future. At the moment they're a bit of a mess.

On 1. we may leverage the HEAP statistic to arrange them according to usage, but the same group of arguments shoul go together, i.e. from-inputs, to-outputs. I expect the top arguments would be --pipeline, --env, --params,--tags`

Looking at the HEAP statistics is a great idea here although maybe there's the question about correlation vs. causation: what if some arguments are already used more frequently more frequently because of the position they appear in kedro run --help 🤔 I doubt that's really the case though, so fully agree about re-ordering them according to how popular they are and grouping together similar arguments.

I agree that 2/3 should be done in 0.19, I don't know if people are using KedroSession.run without the keywords, it's still possible since the most common argument pipeline_name is the first argument. I would stay on the safe side and do it in 0.19

👍

antonymilne avatar Feb 16 '23 09:02 antonymilne