zenml
zenml copied to clipboard
[BUG]: Config.yaml step config only used in first step when calling step multiple times
Contact Details [Optional]
No response
System Information
zenml 0.50.0
What happened?
We have a pipeline with a step named run_model:
@step
def run_model(X_train: np.ndarray, y_train: np.ndarray, X_test: np.ndarray,
y_test: np.ndarray, name: str, configuration: Dict):
Using the new pipeline/step syntax, it is called multiple times:
for model in models:
if model_config[model]['active']:
run_model(X_train, y_train, X_test, y_test, model, configuration, id=model)
We're using a config.yaml based configuration for the step:
run_model:
enable_cache: false
experiment_tracker: "trackername"
settings:
experiment_tracker.mlflow:
experiment_name: "experimentname"
nested: True
However, the configuration is only used in run_model, not in run_model_2, run_model_3 and run_model_4, of which the names are automatically generated.
Is this a bug? If not, how can we avoid this from happening other than manually specifying the config multiple times (this would be somewhat redundant / not DRY).
Thanks!
Reproduction steps
...
Relevant log output
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
To use the same step instance and configuration don't specify the id parameter when calling run_model. This will reuse the same step instance each time:
for model in models:
if model_config[model]['active']:
run_model(X_train, y_train, X_test, y_test, model, configuration)
Or I think you can create the step instance once and reuse it:
model_step = run_model.with_id("model")
for model in models:
if model_config[model]['active']:
model_step(X_train, y_train, X_test, y_test, model, configuration)
This isn't a bug - it's just creating new step instances each time run_model is called with a different id. To reuse the configuration, you need to reuse the same step instance.
It's not clear whether this is considered a bug or not.
Same as @christianversloot, my expectation was that the configuration will be used across all invocations of a step, but it isn't. I think it would be best to make the behavior such that, by default, all steps use the same configuration despite their dynamic suffix (i.e., _1, _2, etc.).
In my case, I have a step that I invoke twice by passing two different values for one its parameters, expecting all other parameters to be common as defined in the configuration. My current workaround to achieve the latter is by utilizing the YAML anchor notation. For example:
steps:
my_step:
parameters: &my_step_params
shared_param_1: "value_1"
shared_param_2: "value_2"
my_step_2:
parameters: *my_step_params