zenml icon indicating copy to clipboard operation
zenml copied to clipboard

[BUG]: Config.yaml step config only used in first step when calling step multiple times

Open christianversloot opened this issue 1 year ago • 3 comments

Contact Details [Optional]

No response

System Information

zenml 0.50.0

What happened?

We have a pipeline with a step named run_model:

@step
def run_model(X_train: np.ndarray, y_train: np.ndarray, X_test: np.ndarray,
              y_test: np.ndarray, name: str,  configuration: Dict):

Using the new pipeline/step syntax, it is called multiple times:

 for model in models:
        if model_config[model]['active']:
            run_model(X_train, y_train, X_test, y_test, model, configuration, id=model)

We're using a config.yaml based configuration for the step:

run_model:
    enable_cache: false
    experiment_tracker: "trackername"
    settings:
      experiment_tracker.mlflow:
        experiment_name: "experimentname"
        nested: True

However, the configuration is only used in run_model, not in run_model_2, run_model_3 and run_model_4, of which the names are automatically generated.

Is this a bug? If not, how can we avoid this from happening other than manually specifying the config multiple times (this would be somewhat redundant / not DRY).

Thanks!

image

Reproduction steps

...

Relevant log output

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

christianversloot avatar Dec 13 '23 11:12 christianversloot

To use the same step instance and configuration don't specify the id parameter when calling run_model. This will reuse the same step instance each time:

for model in models:
  if model_config[model]['active']:
    run_model(X_train, y_train, X_test, y_test, model, configuration)

Or I think you can create the step instance once and reuse it:

model_step = run_model.with_id("model")

for model in models:
  if model_config[model]['active']:
    model_step(X_train, y_train, X_test, y_test, model, configuration)

This isn't a bug - it's just creating new step instances each time run_model is called with a different id. To reuse the configuration, you need to reuse the same step instance.

Vishal-Padia avatar Dec 22 '23 14:12 Vishal-Padia

It's not clear whether this is considered a bug or not.

Same as @christianversloot, my expectation was that the configuration will be used across all invocations of a step, but it isn't. I think it would be best to make the behavior such that, by default, all steps use the same configuration despite their dynamic suffix (i.e., _1, _2, etc.).

In my case, I have a step that I invoke twice by passing two different values for one its parameters, expecting all other parameters to be common as defined in the configuration. My current workaround to achieve the latter is by utilizing the YAML anchor notation. For example:

steps:
  my_step:
    parameters: &my_step_params
      shared_param_1: "value_1"
      shared_param_2: "value_2"
  my_step_2:
     parameters: *my_step_params 

ConX avatar Aug 22 '24 01:08 ConX