kfp-tekton
kfp-tekton copied to clipboard
Pipelineloop cachehit on wrong argument inputs
/kind bug
What steps did you take and what happened: This pipeline can reproduce the bug. When a nested pipeline take the same argument and pass it down to the nested component, the pipelineloop cache hit triggered because it was using the same pipeline argument reference. However, because the argument is a loop, the actual content is different between each loop. Since nested pipeline content won't get stringify until the nested pipeline is executed, it thought the nested loop is already ran before which is a wrong assumption.
Here is the pipeline to reproduce this error
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: loop-multi
annotations:
tekton.dev/output_artifacts: '{"print-01": [{"key": "artifacts/$PIPELINERUN/print-01/output_value.tgz",
"name": "print-01-output_value", "path": "/tmp/outputs/output_value/data"}]}'
tekton.dev/input_artifacts: '{}'
tekton.dev/artifact_bucket: mlpipeline
tekton.dev/artifact_endpoint: minio-service.kubeflow:9000
tekton.dev/artifact_endpoint_scheme: http://
tekton.dev/artifact_items: '{"print-01": [["output_value", "$(results.output-value.path)"]]}'
sidecar.istio.io/inject: "false"
tekton.dev/template: ''
pipelines.kubeflow.org/big_data_passing_format: $(workspaces.$TASK_NAME.path)/artifacts/$ORIG_PR_NAME/$TASKRUN_NAME/$TASK_PARAM_NAME
pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "[\"a\", \"b\",
\"c\"]", "name": "param", "optional": true, "type": "JsonArray"}], "name": "loop-multi"}'
labels:
pipelines.kubeflow.org/pipelinename: ''
pipelines.kubeflow.org/generation: ''
spec:
params:
- name: param
value: '["a", "b", "c"]'
pipelineSpec:
params:
- name: param
default: '["a", "b", "c"]'
tasks:
- name: loop-multi-for-loop-2
params:
- name: loop-item-param-1
value: $(params.param)
- name: loop-item-param-3
value: $(params.param)
- name: param
value: $(params.param)
taskSpec:
apiVersion: custom.tekton.dev/v1alpha1
kind: PipelineLoop
spec:
pipelineSpec:
params:
- name: loop-item-param-1
type: string
- name: loop-item-param-3
type: string
- name: param
type: string
tasks:
- name: loop-multi-for-loop-4
params:
- name: loop-item-param-1
value: $(params.loop-item-param-1)
- name: loop-item-param-3
value: $(params.loop-item-param-3)
taskSpec:
apiVersion: custom.tekton.dev/v1alpha1
kind: PipelineLoop
spec:
pipelineSpec:
params:
- name: loop-item-param-1
type: string
- name: loop-item-param-3
type: string
tasks:
- name: print-01
params:
- name: loop-item-param-1
value: $(params.loop-item-param-1)
- name: loop-item-param-3
value: $(params.loop-item-param-3)
taskSpec:
steps:
- name: main
command:
- sh
- -c
- |
set -e
echo $0 > $1
- print $(inputs.params.loop-item-param-1) $(inputs.params.loop-item-param-3)
- $(results.output-value.path)
image: alpine:3.6
params:
- name: loop-item-param-1
type: string
- name: loop-item-param-3
type: string
results:
- name: output-value
type: string
description: /tmp/outputs/output_value/data
metadata:
labels:
pipelines.kubeflow.org/cache_enabled: "true"
annotations:
pipelines.kubeflow.org/component_spec_digest: '{"name":
"print-01", "outputs": [{"description": "Represents
an output paramter.", "name": "output_value", "type":
"String"}], "version": "print-01@sha256=d511ac628d43cc5b393fbebd10be93662b30117f1413b84afd4e7b2e5ff5ed33"}'
iterateParam: loop-item-param-3
metadata:
labels:
pipelines.kubeflow.org/cache_enabled: "true"
iterateParam: loop-item-param-1
metadata:
labels:
pipelines.kubeflow.org/cache_enabled: "true"
What did you expect to happen: This should produce 9 printop pods, but instead the cachehit was triggered incorrectly for the last 2 nested loop and ended up with 3 printop.
Additional information: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Python Version (use
python --version): - SDK Version:
- Tekton Version (use
tkn version): - Kubernetes Version (use
kubectl version): - OS (e.g. from
/etc/os-release):
/assign @ScrapCodes