kfp-tekton icon indicating copy to clipboard operation
kfp-tekton copied to clipboard

Pipelineloop cachehit on wrong argument inputs

Open Tomcli opened this issue 3 years ago • 1 comments

/kind bug

What steps did you take and what happened: This pipeline can reproduce the bug. When a nested pipeline take the same argument and pass it down to the nested component, the pipelineloop cache hit triggered because it was using the same pipeline argument reference. However, because the argument is a loop, the actual content is different between each loop. Since nested pipeline content won't get stringify until the nested pipeline is executed, it thought the nested loop is already ran before which is a wrong assumption.

Here is the pipeline to reproduce this error

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: loop-multi
  annotations:
    tekton.dev/output_artifacts: '{"print-01": [{"key": "artifacts/$PIPELINERUN/print-01/output_value.tgz",
      "name": "print-01-output_value", "path": "/tmp/outputs/output_value/data"}]}'
    tekton.dev/input_artifacts: '{}'
    tekton.dev/artifact_bucket: mlpipeline
    tekton.dev/artifact_endpoint: minio-service.kubeflow:9000
    tekton.dev/artifact_endpoint_scheme: http://
    tekton.dev/artifact_items: '{"print-01": [["output_value", "$(results.output-value.path)"]]}'
    sidecar.istio.io/inject: "false"
    tekton.dev/template: ''
    pipelines.kubeflow.org/big_data_passing_format: $(workspaces.$TASK_NAME.path)/artifacts/$ORIG_PR_NAME/$TASKRUN_NAME/$TASK_PARAM_NAME
    pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "[\"a\", \"b\",
      \"c\"]", "name": "param", "optional": true, "type": "JsonArray"}], "name": "loop-multi"}'
  labels:
    pipelines.kubeflow.org/pipelinename: ''
    pipelines.kubeflow.org/generation: ''
spec:
  params:
  - name: param
    value: '["a", "b", "c"]'
  pipelineSpec:
    params:
    - name: param
      default: '["a", "b", "c"]'
    tasks:
    - name: loop-multi-for-loop-2
      params:
      - name: loop-item-param-1
        value: $(params.param)
      - name: loop-item-param-3
        value: $(params.param)
      - name: param
        value: $(params.param)
      taskSpec:
        apiVersion: custom.tekton.dev/v1alpha1
        kind: PipelineLoop
        spec:
          pipelineSpec:
            params:
            - name: loop-item-param-1
              type: string
            - name: loop-item-param-3
              type: string
            - name: param
              type: string
            tasks:
            - name: loop-multi-for-loop-4
              params:
              - name: loop-item-param-1
                value: $(params.loop-item-param-1)
              - name: loop-item-param-3
                value: $(params.loop-item-param-3)
              taskSpec:
                apiVersion: custom.tekton.dev/v1alpha1
                kind: PipelineLoop
                spec:
                  pipelineSpec:
                    params:
                    - name: loop-item-param-1
                      type: string
                    - name: loop-item-param-3
                      type: string
                    tasks:
                    - name: print-01
                      params:
                      - name: loop-item-param-1
                        value: $(params.loop-item-param-1)
                      - name: loop-item-param-3
                        value: $(params.loop-item-param-3)
                      taskSpec:
                        steps:
                        - name: main
                          command:
                          - sh
                          - -c
                          - |
                            set -e
                            echo $0 > $1
                          - print $(inputs.params.loop-item-param-1) $(inputs.params.loop-item-param-3)
                          - $(results.output-value.path)
                          image: alpine:3.6
                        params:
                        - name: loop-item-param-1
                          type: string
                        - name: loop-item-param-3
                          type: string
                        results:
                        - name: output-value
                          type: string
                          description: /tmp/outputs/output_value/data
                        metadata:
                          labels:
                            pipelines.kubeflow.org/cache_enabled: "true"
                          annotations:
                            pipelines.kubeflow.org/component_spec_digest: '{"name":
                              "print-01", "outputs": [{"description": "Represents
                              an output paramter.", "name": "output_value", "type":
                              "String"}], "version": "print-01@sha256=d511ac628d43cc5b393fbebd10be93662b30117f1413b84afd4e7b2e5ff5ed33"}'
                  iterateParam: loop-item-param-3
                metadata:
                  labels:
                    pipelines.kubeflow.org/cache_enabled: "true"
          iterateParam: loop-item-param-1
        metadata:
          labels:
            pipelines.kubeflow.org/cache_enabled: "true"

What did you expect to happen: This should produce 9 printop pods, but instead the cachehit was triggered incorrectly for the last 2 nested loop and ended up with 3 printop.

Additional information: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Python Version (use python --version):
  • SDK Version:
  • Tekton Version (use tkn version):
  • Kubernetes Version (use kubectl version):
  • OS (e.g. from /etc/os-release):

Tomcli avatar Oct 04 '22 16:10 Tomcli

/assign @ScrapCodes

Tomcli avatar Oct 04 '22 16:10 Tomcli