pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[backend] KFP v2 currently doesn't support "Retry run" function

Open jlyaoyuli opened this issue 3 years ago • 2 comments

For the KFP v2, Retry Run function hasn't been implemented. Currently, when users click retry button, they are blocked by this error message: https://screenshot.googleplex.com/3rFyhstqcjeyXAG.

Refer to: https://github.com/kubeflow/pipelines/blob/master/backend/src/apiserver/resource/resource_manager.go#L557-L559

jlyaoyuli avatar May 04 '22 23:05 jlyaoyuli

/assign @chensun

jlyaoyuli avatar May 04 '22 23:05 jlyaoyuli

Hi, customer here :)

I'm having an issue that might be related to this ( according to https://issuetracker.google.com/issues/226569351 ). Happy to post elsewhere if not.

The set_retry does not appear to work when setting an integer backoff_duration. We get the error AttributeError: 'int' object has no attribute 'replace'.

It seems integer coercion to string is not implemented. Integer backoff_duration is allowed in the docs here https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.BaseOp.set_retry

Though it admittedly does violate the type hint.

Full error below.

To reproduce: add something like .set_retry(3, backoff_duration = 10) to a ContainerOp. This comes through as something along the lines of:

from kfp.v2.compiler.compiler_utils import make_retry_policy_proto
make_retry_policy_proto(
    max_retry_count=3,
    backoff_duration=10,
    backoff_factor=None,
    backoff_max_duration=None
)

giving the full error:

[...] in compile_pipeline(output_filename, pipeline_func)
     10     compiler.Compiler().compile(
     11         pipeline_func = pipeline_func,
---> 12         package_path = output_filename
     13     )
     14 

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
   1304                 pipeline_func=pipeline_func,
   1305                 pipeline_name=pipeline_name,
-> 1306                 pipeline_parameters_override=pipeline_parameters)
   1307             self._write_pipeline(pipeline_job, package_path)
   1308         finally:

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
   1249         pipeline_spec = self._create_pipeline_spec(
   1250             args_list_with_defaults,
-> 1251             dsl_pipeline,
   1252         )
   1253 

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_spec(self, args, pipeline)
   1055                 op_to_parent_groups=op_name_to_parent_groups,
   1056                 opgroup_to_parent_groups=opgroup_name_to_parent_groups,
-> 1057                 op_name_to_for_loop_op=op_name_to_for_loop_op,
   1058             )
   1059 

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _group_to_dag_spec(self, group, inputs, dependencies, pipeline_spec, deployment_config, rootgroup_name, op_to_parent_groups, opgroup_to_parent_groups, op_name_to_for_loop_op)
    751                         backoff_duration=subgroup.backoff_duration,
    752                         backoff_factor=subgroup.backoff_factor,
--> 753                         backoff_max_duration=subgroup.backoff_max_duration,
    754                     )
    755                     subgroup.task_spec.retry_policy.CopyFrom(retry_policy_proto)

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler_utils.py in make_retry_policy_proto(max_retry_count, backoff_duration, backoff_factor, backoff_max_duration)
    153     backoff_max_duration = backoff_max_duration or '3600s'
    154 
--> 155     backoff_duration_seconds = f'{convert_duration_to_seconds(backoff_duration)}s'
    156     backoff_max_duration_seconds = f'{min(convert_duration_to_seconds(backoff_max_duration), 3600)}s'
    157 

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler_utils.py in convert_duration_to_seconds(duration)
    209         int: The number of seconds in the duration.
    210     """
--> 211     duration = normalize_time_string(duration)
    212     seconds_per_unit = {'s': 1, 'm': 60, 'h': 3_600, 'd': 86_400, 'w': 604_800}
    213     if duration[-1] not in seconds_per_unit.keys():

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler_utils.py in normalize_time_string(duration)
    180         str: The normalized duration string.
    181     """
--> 182     no_ws_duration = duration.replace(' ', '')
    183     duration_split = [el for el in re.split(r'(\D+)', no_ws_duration) if el]
    184 

AttributeError: 'int' object has no attribute 'replace'

cgparkinson avatar Aug 02 '22 12:08 cgparkinson

Retry run is supported now, see #8804

Linchin avatar Aug 23 '23 23:08 Linchin