Allow configuration of max attempts for a task
Currently, a user can attempt to run a specific task up to a maximum of 6 times. It would be beneficial to make this value configurable.
In our use case, we are working on integrating Argo retries with Metaflow’s retried Argo workflows. This environment variable would allow us to set a limit on how many times a user can retry an Argo workflow.
That said, beyond our specific use case, adding this configuration flexibility would be generally useful.
Current Behaviour
import pandas as pd
from metaflow import (
FlowSpec,
Parameter,
card,
project,
step,
retry
)
@project(name="dummy_project")
class HelloWorld(FlowSpec):
force_error = Parameter("force-error", type=bool, default=False)
@card
@step
def start(self):
print("something")
self.next(self.end)
@card
@retry(times=10)
@step
def end(self):
if self.force_error:
raise Exception("Testing errors in metaflow")
print(f"the data artifact is: {self.my_var}")
if __name__ == "__main__":
HelloWorld()
- Running the above flow locally via
python hello_world.py runthrows the following exception
Metaflow 2.14.0 executing HelloWorld for user:j.kollipara
Project: dummy_project, Branch: user.j.kollipara
Validating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
Flow failed:
The maximum number of retries is @retry(times=4).
error: Recipe `_poetry-run` failed with exit code 1
Source code of the above error: https://github.com/Netflix/metaflow/blob/5c960eaff1ae486f503b37177f03cc1419b5571d/metaflow/plugins/retry_decorator.py#L30-L37
Proposed Behaviour
Setting METAFLOW_MAX_ATTEMPTS=12 would allow users to run the above flow.
I have already put up a PR with the proposed change, let me know what you guys would think of it.
- https://github.com/Netflix/metaflow/pull/2279
Created a new PR, since the old PR was based on development branch.