kfp-tekton icon indicating copy to clipboard operation
kfp-tekton copied to clipboard

bug(sdk) Double-looping over the same pipelineparam miscompiles

Open Udiknedormin opened this issue 3 years ago • 2 comments

/kind bug

What steps did you take and what happened: Create two loop, one nested in the other, both iterating over the same pipeline-parameter. Inside of the inner loop, add a task that uses the items of both of them.

from kfp import dsl
from kfp.components import load_component_from_text
from kfp_tekton.tekton import Loop
from kfp_tekton.compiler import TektonCompiler


class Coder:
  def empty(self):
    return ""


TektonCompiler._get_unique_id_code = Coder.empty


def PrintOp(name: str, msg: str = None):
  if msg is None:
    msg = name
  print_op = load_component_from_text(
  """
  name: %s
  inputs:
  - {name: input_text, type: String, description: 'Represents an input parameter.'}
  outputs:
  - {name: output_value, type: String, description: 'Represents an output paramter.'}
  implementation:
    container:
      image: alpine:3.6
      command:
      - sh
      - -c
      - |
        set -e
        echo $0 > $1
      - {inputValue: input_text}
      - {outputPath: output_value}
  """ % (name)
  )
  return print_op(msg)


@dsl.pipeline("loop-multi")
def loop_multi(param: list = ["a", "b", "c"]):
  with Loop(param) as it0:
    with Loop(param) as it1:
      PrintOp("print-01", f"print {it0} {it1}")


if __name__ == '__main__':
  TektonCompiler().compile(loop_multi, __file__.replace('.py', '.yaml'))

The generated yaml:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: loop-multi
  annotations:
    tekton.dev/output_artifacts: '{"print-01": [{"key": "artifacts/$PIPELINERUN/print-01/output_value.tgz",
      "name": "print-01-output_value", "path": "/tmp/outputs/output_value/data"}]}'
    tekton.dev/input_artifacts: '{}'
    tekton.dev/artifact_bucket: mlpipeline
    tekton.dev/artifact_endpoint: minio-service.kubeflow:9000
    tekton.dev/artifact_endpoint_scheme: http://
    tekton.dev/artifact_items: '{"print-01": [["output_value", "$(results.output-value.path)"]]}'
    sidecar.istio.io/inject: "false"
    pipelines.kubeflow.org/big_data_passing_format: $(workspaces.$TASK_NAME.path)/artifacts/$ORIG_PR_NAME/$TASKRUN_NAME/$TASK_PARAM_NAME
    pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "[\"a\", \"b\",
      \"c\"]", "name": "param", "optional": true, "type": "JsonArray"}], "name": "loop-multi"}'
spec:
  params:
  - name: param
    value: '["a", "b", "c"]'
  pipelineSpec:
    params:
    - name: param
      default: '["a", "b", "c"]'
    tasks:
    - name: loop-multi-for-loop-1
      params:
      - name: param
        value: $(params.param)
      - name: param-loop-item
        value: $(params.param)
      taskSpec:
        apiVersion: custom.tekton.dev/v1alpha1
        kind: PipelineLoop
        spec:
          pipelineSpec:
            params:
            - name: param
              type: string
            - name: param-loop-item
              type: string
            tasks:
            - name: loop-multi-for-loop-2
              params:
              - name: param-loop-item
                value: $(params.param)
              taskSpec:
                apiVersion: custom.tekton.dev/v1alpha1
                kind: PipelineLoop
                spec:
                  pipelineSpec:
                    params:
                    - name: param-loop-item
                      type: string
                    tasks:
                    - name: print-01
                      params:
                      - name: param-loop-item
                        value: $(params.param-loop-item)
                      taskSpec:
                        steps:
                        - name: main
                          command:
                          - sh
                          - -c
                          - |
                            set -e
                            echo $0 > $1
                          - print $(inputs.params.param-loop-item) $(inputs.params.param-loop-item)
                          - $(results.output-value.path)
                          image: alpine:3.6
                        params:
                        - name: param-loop-item
                          type: string
                        results:
                        - name: output-value
                          type: string
                          description: /tmp/outputs/output_value/data
                        metadata:
                          labels:
                            pipelines.kubeflow.org/cache_enabled: "true"
                          annotations:
                            pipelines.kubeflow.org/component_spec_digest: '{"name":
                              "print-01", "outputs": [{"description": "Represents
                              an output paramter.", "name": "output_value", "type":
                              "String"}], "version": "print-01@sha256=d511ac628d43cc5b393fbebd10be93662b30117f1413b84afd4e7b2e5ff5ed33"}'
                      timeout: 525600m
                  iterateParam: param-loop-item
                metadata:
                  labels:
                    pipelines.kubeflow.org/cache_enabled: "true"
          iterateParam: param-loop-item
        metadata:
          labels:
            pipelines.kubeflow.org/cache_enabled: "true"
  timeouts:
    pipeline: 1051200m
    tasks: 525600m

Note:

PrintOp("print-01", f"print {it0} {it1}")

vs

print $(inputs.params.param-loop-item) $(inputs.params.param-loop-item)

So at runtime, it's:

  • print a a
  • print b b
  • print c c
  • print a a
  • print b b
  • print c c
  • print a a
  • print b b
  • print c c

Instead of:

  • print a a
  • print a b
  • print a c
  • print b a
  • print b b
  • print b c
  • print c a
  • print c b
  • print c c

What did you expect to happen: Something like:

print $(inputs.params.param-loop-item-0) $(inputs.params.param-loop-item-1)

The first one being generated by the outer loop, the second one --- by the inner one.

Additional information: Nothing to report.

Environment:

  • Python Version (use python --version): 3.10
  • SDK Version: 33e3ab2a82a39875d0727e61bde9a1f87682daf1
  • Tekton Version (use tkn version): irrelevant
  • Kubernetes Version (use kubectl version): irrelevant
  • OS (e.g. from /etc/os-release): irrelevant

Udiknedormin avatar Sep 15 '22 20:09 Udiknedormin

/assign

Tomcli avatar Sep 19 '22 16:09 Tomcli

Seems like this is a DSL level bug as the DSL make both it0 and it1 pointing to the same variable instead of copying it. This bug is also happening in Argo as well.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: loop-multi-
  annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.13, pipelines.kubeflow.org/pipeline_compilation_time: '2022-09-19T09:59:53.323966',
    pipelines.kubeflow.org/pipeline_spec: '{"inputs": [{"default": "[\"a\", \"b\",
      \"c\"]", "name": "param", "optional": true, "type": "JsonArray"}], "name": "loop-multi"}'}
  labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.13}
spec:
  entrypoint: loop-multi
  templates:
  - name: for-loop-1
    dag:
      tasks:
      - name: for-loop-2
        template: for-loop-2
        arguments:
          parameters:
          - {name: param-loop-item, value: '{{item}}'}
        withParam: '{{workflow.parameters.param}}'
  - name: for-loop-2
    inputs:
      parameters:
      - {name: param-loop-item}
    dag:
      tasks:
      - name: print-01
        template: print-01
        arguments:
          parameters:
          - {name: param-loop-item, value: '{{inputs.parameters.param-loop-item}}'}
  - name: loop-multi
    dag:
      tasks:
      - {name: for-loop-1, template: for-loop-1, withParam: '{{workflow.parameters.param}}'}
  - name: print-01
    container:
      args: []
      command:
      - sh
      - -c
      - |
        set -e
        echo $0 > $1
      - print {{inputs.parameters.param-loop-item}} {{inputs.parameters.param-loop-item}}
      - /tmp/outputs/output_value/data
      image: alpine:3.6
    inputs:
      parameters:
      - {name: param-loop-item}
    outputs:
      artifacts:
      - {name: print-01-output_value, path: /tmp/outputs/output_value/data}
    metadata:
      labels:
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.13
        pipelines.kubeflow.org/pipeline-sdk-type: kfp
        pipelines.kubeflow.org/enable_caching: "true"
      annotations: {pipelines.kubeflow.org/component_spec: '{"implementation": {"container":
          {"command": ["sh", "-c", "set -e\necho $0 > $1\n", {"inputValue": "input_text"},
          {"outputPath": "output_value"}], "image": "alpine:3.6"}}, "inputs": [{"description":
          "Represents an input parameter.", "name": "input_text", "type": "String"}],
          "name": "print-01", "outputs": [{"description": "Represents an output paramter.",
          "name": "output_value", "type": "String"}]}', pipelines.kubeflow.org/component_ref: '{"digest":
          "5c67ea456b952b797005405fe2398d732757b945a7978b864a8366f7da98e761"}', pipelines.kubeflow.org/arguments.parameters: '{"input_text":
          "print {{inputs.parameters.param-loop-item}} {{inputs.parameters.param-loop-item}}"}'}
  arguments:
    parameters:
    - {name: param, value: '["a", "b", "c"]'}
  serviceAccountName: pipeline-runner

Tomcli avatar Sep 19 '22 17:09 Tomcli