kfp-tekton
kfp-tekton copied to clipboard
bug(sdk) Double-looping over the same pipelineparam miscompiles
/kind bug
What steps did you take and what happened: Create two loop, one nested in the other, both iterating over the same pipeline-parameter. Inside of the inner loop, add a task that uses the items of both of them.
from kfp import dsl
from kfp.components import load_component_from_text
from kfp_tekton.tekton import Loop
from kfp_tekton.compiler import TektonCompiler
class Coder:
def empty(self):
return ""
TektonCompiler._get_unique_id_code = Coder.empty
def PrintOp(name: str, msg: str = None):
if msg is None:
msg = name
print_op = load_component_from_text(
"""
name: %s
inputs:
- {name: input_text, type: String, description: 'Represents an input parameter.'}
outputs:
- {name: output_value, type: String, description: 'Represents an output paramter.'}
implementation:
container:
image: alpine:3.6
command:
- sh
- -c
- |
set -e
echo $0 > $1
- {inputValue: input_text}
- {outputPath: output_value}
""" % (name)
)
return print_op(msg)
@dsl.pipeline("loop-multi")
def loop_multi(param: list = ["a", "b", "c"]):
with Loop(param) as it0:
with Loop(param) as it1:
PrintOp("print-01", f"print {it0} {it1}")
if __name__ == '__main__':
TektonCompiler().compile(loop_multi, __file__.replace('.py', '.yaml'))
The generated yaml:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: loop-multi
annotations:
tekton.dev/output_artifacts: '{"print-01": [{"key": "artifacts/$PIPELINERUN/print-01/output_value.tgz",
"name": "print-01-output_value", "path": "/tmp/outputs/output_value/data"}]}'
tekton.dev/input_artifacts: '{}'
tekton.dev/artifact_bucket: mlpipeline
tekton.dev/artifact_endpoint: minio-service.kubeflow:9000
tekton.dev/artifact_endpoint_scheme: http://
tekton.dev/artifact_items: '{"print-01": [["output_value", "$(results.output-value.path)"]]}'
sidecar.istio.io/inject: "false"
pipelines.kubeflow.org/big_data_passing_format: $(workspaces.$TASK_NAME.path)/artifacts/$ORIG_PR_NAME/$TASKRUN_NAME/$TASK_PARAM_NAME
pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "[\"a\", \"b\",
\"c\"]", "name": "param", "optional": true, "type": "JsonArray"}], "name": "loop-multi"}'
spec:
params:
- name: param
value: '["a", "b", "c"]'
pipelineSpec:
params:
- name: param
default: '["a", "b", "c"]'
tasks:
- name: loop-multi-for-loop-1
params:
- name: param
value: $(params.param)
- name: param-loop-item
value: $(params.param)
taskSpec:
apiVersion: custom.tekton.dev/v1alpha1
kind: PipelineLoop
spec:
pipelineSpec:
params:
- name: param
type: string
- name: param-loop-item
type: string
tasks:
- name: loop-multi-for-loop-2
params:
- name: param-loop-item
value: $(params.param)
taskSpec:
apiVersion: custom.tekton.dev/v1alpha1
kind: PipelineLoop
spec:
pipelineSpec:
params:
- name: param-loop-item
type: string
tasks:
- name: print-01
params:
- name: param-loop-item
value: $(params.param-loop-item)
taskSpec:
steps:
- name: main
command:
- sh
- -c
- |
set -e
echo $0 > $1
- print $(inputs.params.param-loop-item) $(inputs.params.param-loop-item)
- $(results.output-value.path)
image: alpine:3.6
params:
- name: param-loop-item
type: string
results:
- name: output-value
type: string
description: /tmp/outputs/output_value/data
metadata:
labels:
pipelines.kubeflow.org/cache_enabled: "true"
annotations:
pipelines.kubeflow.org/component_spec_digest: '{"name":
"print-01", "outputs": [{"description": "Represents
an output paramter.", "name": "output_value", "type":
"String"}], "version": "print-01@sha256=d511ac628d43cc5b393fbebd10be93662b30117f1413b84afd4e7b2e5ff5ed33"}'
timeout: 525600m
iterateParam: param-loop-item
metadata:
labels:
pipelines.kubeflow.org/cache_enabled: "true"
iterateParam: param-loop-item
metadata:
labels:
pipelines.kubeflow.org/cache_enabled: "true"
timeouts:
pipeline: 1051200m
tasks: 525600m
Note:
PrintOp("print-01", f"print {it0} {it1}")
vs
print $(inputs.params.param-loop-item) $(inputs.params.param-loop-item)
So at runtime, it's:
- print a a
- print b b
- print c c
- print a a
- print b b
- print c c
- print a a
- print b b
- print c c
Instead of:
- print a a
- print a b
- print a c
- print b a
- print b b
- print b c
- print c a
- print c b
- print c c
What did you expect to happen: Something like:
print $(inputs.params.param-loop-item-0) $(inputs.params.param-loop-item-1)
The first one being generated by the outer loop, the second one --- by the inner one.
Additional information: Nothing to report.
Environment:
- Python Version (use
python --version): 3.10 - SDK Version: 33e3ab2a82a39875d0727e61bde9a1f87682daf1
- Tekton Version (use
tkn version): irrelevant - Kubernetes Version (use
kubectl version): irrelevant - OS (e.g. from
/etc/os-release): irrelevant
/assign
Seems like this is a DSL level bug as the DSL make both it0 and it1 pointing to the same variable instead of copying it. This bug is also happening in Argo as well.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loop-multi-
annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.13, pipelines.kubeflow.org/pipeline_compilation_time: '2022-09-19T09:59:53.323966',
pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "[\"a\", \"b\",
\"c\"]", "name": "param", "optional": true, "type": "JsonArray"}], "name": "loop-multi"}'}
labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.13}
spec:
entrypoint: loop-multi
templates:
- name: for-loop-1
dag:
tasks:
- name: for-loop-2
template: for-loop-2
arguments:
parameters:
- {name: param-loop-item, value: '{{item}}'}
withParam: '{{workflow.parameters.param}}'
- name: for-loop-2
inputs:
parameters:
- {name: param-loop-item}
dag:
tasks:
- name: print-01
template: print-01
arguments:
parameters:
- {name: param-loop-item, value: '{{inputs.parameters.param-loop-item}}'}
- name: loop-multi
dag:
tasks:
- {name: for-loop-1, template: for-loop-1, withParam: '{{workflow.parameters.param}}'}
- name: print-01
container:
args: []
command:
- sh
- -c
- |
set -e
echo $0 > $1
- print {{inputs.parameters.param-loop-item}} {{inputs.parameters.param-loop-item}}
- /tmp/outputs/output_value/data
image: alpine:3.6
inputs:
parameters:
- {name: param-loop-item}
outputs:
artifacts:
- {name: print-01-output_value, path: /tmp/outputs/output_value/data}
metadata:
labels:
pipelines.kubeflow.org/kfp_sdk_version: 1.8.13
pipelines.kubeflow.org/pipeline-sdk-type: kfp
pipelines.kubeflow.org/enable_caching: "true"
annotations: {pipelines.kubeflow.org/component_spec: '{"implementation": {"container":
{"command": ["sh", "-c", "set -e\necho $0 > $1\n", {"inputValue": "input_text"},
{"outputPath": "output_value"}], "image": "alpine:3.6"}}, "inputs": [{"description":
"Represents an input parameter.", "name": "input_text", "type": "String"}],
"name": "print-01", "outputs": [{"description": "Represents an output paramter.",
"name": "output_value", "type": "String"}]}', pipelines.kubeflow.org/component_ref: '{"digest":
"5c67ea456b952b797005405fe2398d732757b945a7978b864a8366f7da98e761"}', pipelines.kubeflow.org/arguments.parameters: '{"input_text":
"print {{inputs.parameters.param-loop-item}} {{inputs.parameters.param-loop-item}}"}'}
arguments:
parameters:
- {name: param, value: '["a", "b", "c"]'}
serviceAccountName: pipeline-runner