pipelines
pipelines copied to clipboard
[backend] KFP v2 currently doesn't support "Retry run" function
For the KFP v2, Retry Run function hasn't been implemented. Currently, when users click retry button, they are blocked by this error message: https://screenshot.googleplex.com/3rFyhstqcjeyXAG.
Refer to: https://github.com/kubeflow/pipelines/blob/master/backend/src/apiserver/resource/resource_manager.go#L557-L559
/assign @chensun
Hi, customer here :)
I'm having an issue that might be related to this ( according to https://issuetracker.google.com/issues/226569351 ). Happy to post elsewhere if not.
The set_retry does not appear to work when setting an integer backoff_duration. We get the error AttributeError: 'int' object has no attribute 'replace'.
It seems integer coercion to string is not implemented. Integer backoff_duration is allowed in the docs here https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.BaseOp.set_retry
Though it admittedly does violate the type hint.
Full error below.
To reproduce: add something like .set_retry(3, backoff_duration = 10) to a ContainerOp.
This comes through as something along the lines of:
from kfp.v2.compiler.compiler_utils import make_retry_policy_proto
make_retry_policy_proto(
max_retry_count=3,
backoff_duration=10,
backoff_factor=None,
backoff_max_duration=None
)
giving the full error:
[...] in compile_pipeline(output_filename, pipeline_func)
10 compiler.Compiler().compile(
11 pipeline_func = pipeline_func,
---> 12 package_path = output_filename
13 )
14
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
1304 pipeline_func=pipeline_func,
1305 pipeline_name=pipeline_name,
-> 1306 pipeline_parameters_override=pipeline_parameters)
1307 self._write_pipeline(pipeline_job, package_path)
1308 finally:
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
1249 pipeline_spec = self._create_pipeline_spec(
1250 args_list_with_defaults,
-> 1251 dsl_pipeline,
1252 )
1253
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_spec(self, args, pipeline)
1055 op_to_parent_groups=op_name_to_parent_groups,
1056 opgroup_to_parent_groups=opgroup_name_to_parent_groups,
-> 1057 op_name_to_for_loop_op=op_name_to_for_loop_op,
1058 )
1059
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _group_to_dag_spec(self, group, inputs, dependencies, pipeline_spec, deployment_config, rootgroup_name, op_to_parent_groups, opgroup_to_parent_groups, op_name_to_for_loop_op)
751 backoff_duration=subgroup.backoff_duration,
752 backoff_factor=subgroup.backoff_factor,
--> 753 backoff_max_duration=subgroup.backoff_max_duration,
754 )
755 subgroup.task_spec.retry_policy.CopyFrom(retry_policy_proto)
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler_utils.py in make_retry_policy_proto(max_retry_count, backoff_duration, backoff_factor, backoff_max_duration)
153 backoff_max_duration = backoff_max_duration or '3600s'
154
--> 155 backoff_duration_seconds = f'{convert_duration_to_seconds(backoff_duration)}s'
156 backoff_max_duration_seconds = f'{min(convert_duration_to_seconds(backoff_max_duration), 3600)}s'
157
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler_utils.py in convert_duration_to_seconds(duration)
209 int: The number of seconds in the duration.
210 """
--> 211 duration = normalize_time_string(duration)
212 seconds_per_unit = {'s': 1, 'm': 60, 'h': 3_600, 'd': 86_400, 'w': 604_800}
213 if duration[-1] not in seconds_per_unit.keys():
/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler_utils.py in normalize_time_string(duration)
180 str: The normalized duration string.
181 """
--> 182 no_ws_duration = duration.replace(' ', '')
183 duration_split = [el for el in re.split(r'(\D+)', no_ws_duration) if el]
184
AttributeError: 'int' object has no attribute 'replace'
Retry run is supported now, see #8804