metaflow
metaflow copied to clipboard
[WIP] feature: support argo retry
This PR, modifies the step cli command to infer retry_count from flow_datastore class. Looks like this class, holds all the information about underlying datastore and run artifacts.
- Adds a
flow_datastoreclass method that infers latest done attempt of a task. - Remove
MF_ATTEMPTfrommflog.save_logspython script and infer attempt from the datastore.
Resolves: https://github.com/Netflix/metaflow/issues/2278
A known limitation for this solution would be:
- Retrying more than
6times (or in current case, hitting argo retry button twice) times results in metaflow exception
We are planning to address this limitation by making MAX_ATTEMPTS configurable via envvar.
- https://github.com/Netflix/metaflow/pull/2279