pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

Using set_display_name results in ‘Cannot Find Producer Task’ error in Kubeflow Pipelines

Open milosjava opened this issue 1 year ago • 12 comments

Environment

Kubeflow version: 1.9 KFP SDK version: 2.8.0

Backend: Argo

Steps to reproduce

  1. Define two components (addition and divide) and create a pipeline (hello_pipeline) that chains them together.
  2. Use the set_display_name function to set a display name for the first component (addition).
  3. Compile the pipeline and run it. Pipeline will fail cause the output from the first component is not available to the second component (divide). Please check the error log at the end of this issue.

Here is the Python code that reproduces the issue:

from kfp import dsl

@dsl.component
def addition(a: float, b: float) -> float:
    print("hi")
    return a + b

@dsl.component
def divide(a: float, b: float) -> float:
    print("hi")
    return a / b

@dsl.pipeline
def hello_pipeline(a: float = 1, b: float = 2, c: float = 3) -> None:
    total = addition(a=a, b=b).set_display_name('total')
    fraction = divide(a=total.output, b=c)

from kfp import compiler

compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')

Expected result

The pipeline should run successfully with the set_display_name method used.

Actual result

Pipeline fails in divide component with error message:

I0905 21:49:34.529181      21 main.go:108] input ComponentSpec:{
  "executorLabel": "exec-divide",
  "inputDefinitions": {
    "parameters": {
      "a": {
        "parameterType": "NUMBER_DOUBLE"
      },
      "b": {
        "parameterType": "NUMBER_DOUBLE"
      }
    }
  },
  "outputDefinitions": {
    "parameters": {
      "Output": {
        "parameterType": "NUMBER_DOUBLE"
      }
    }
  }
}
I0905 21:49:34.530368      21 main.go:115] input TaskSpec:{
  "cachingOptions": {
    "enableCache": true
  },
  "componentRef": {
    "name": "comp-divide"
  },
  "dependentTasks": [
    "addition"
  ],
  "inputs": {
    "parameters": {
      "a": {
        "taskOutputParameter": {
          "outputParameterKey": "Output",
          "producerTask": "addition"
        }
      },
      "b": {
        "componentInputParameter": "c"
      }
    }
  },
  "taskInfo": {
    "name": "divide"
  }
}
I0905 21:49:34.531025      21 main.go:121] input ContainerSpec:{
  "args": [
    "--executor_input",
    "{{$}}",
    "--function_to_execute",
    "divide"
  ],
  "command": [
    "sh",
    "-c",
    "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.8.0' '--no-deps' 'typing-extensions\u003e=3.7.4,\u003c5; python_version\u003c\"3.9\"' \u0026\u0026 \"$0\" \"$@\"\n",
    "sh",
    "-ec",
    "program_path=$(mktemp -d)\n\nprintf \"%s\" \"$0\" \u003e \"$program_path/ephemeral_component.py\"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         \"$program_path/ephemeral_component.py\"                         \"$@\"\n",
    "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef divide(a: float, b: float) -\u003e float:\n    print(\"hi\")\n    return a / b\n\n"
  ],
  "image": "python:3.8"
}
I0905 21:49:34.531735      21 cache.go:139] Cannot detect ml-pipeline in the same namespace, default to ml-pipeline.kubeflow:8887 as KFP endpoint.
I0905 21:49:34.531764      21 cache.go:116] Connecting to cache endpoint ml-pipeline.kubeflow:8887
I0905 21:49:34.607730      21 client.go:302] Pipeline Context: id:17447  name:"hello-pipeline"  type_id:26  type:"system.Pipeline"  create_time_since_epoch:1725572300413  last_update_time_since_epoch:1725572300413
I0905 21:49:34.667931      21 client.go:311] Pipeline Run Context: id:17449  name:"ce7a10c3-b5e4-4222-ae34-919cb179134c"  type_id:27  type:"system.PipelineRun"  custom_properties:{key:"namespace"  value:{string_value:"milos-grubjesic"}}  custom_properties:{key:"pipeline_root"  value:{string_value:"minio://kubeflow-content/v2/artifacts/hello-pipeline/ce7a10c3-b5e4-4222-ae34-919cb179134c"}}  custom_properties:{key:"resource_name"  value:{string_value:"run-resource"}}  custom_properties:{key:"store_session_info"  value:{string_value:"{\"Provider\":\"minio\",\"Params\":{\"accessKeyKey\":\"accesskey\",\"disableSSL\":\"true\",\"endpoint\":\"minio-service.kubeflow:9000\",\"fromEnv\":\"false\",\"region\":\"minio\",\"secretKeyKey\":\"secretkey\",\"secretName\":\"mlpipeline-minio-artifact\"}}"}}  create_time_since_epoch:1725572932246  last_update_time_since_epoch:1725572932246
I0905 21:49:35.060625      21 driver.go:252] parent DAG: id:461494  name:"run/ce7a10c3-b5e4-4222-ae34-919cb179134c"  type_id:239  type:"system.DAGExecution"  last_known_state:RUNNING  custom_properties:{key:"display_name"  value:{string_value:""}}  custom_properties:{key:"inputs"  value:{struct_value:{fields:{key:"a"  value:{number_value:1}}  fields:{key:"b"  value:{number_value:2}}  fields:{key:"c"  value:{number_value:3}}}}}  custom_properties:{key:"task_name"  value:{string_value:""}}  create_time_since_epoch:1725572932561  last_update_time_since_epoch:1725572932561
I0905 21:49:35.176022      21 driver.go:926] parent DAG input parameters: map[a:number_value:1 b:number_value:2 c:number_value:3], artifacts: map[]
F0905 21:49:35.233819      21 main.go:79] KFP driver: driver.Container(pipelineName=hello-pipeline, runID=ce7a10c3-b5e4-4222-ae34-919cb179134c, task="divide", component="comp-divide", dagExecutionID=461494, componentSpec) failed: failed to resolve inputs: resolving input parameter a with spec task_output_parameter:{producer_task:"addition"  output_parameter_key:"Output"}: cannot find producer task "addition"
time="2024-09-05T21:49:35.435Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-09-05T21:49:35.435Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-09-05T21:49:35.435Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-09-05T21:49:35.435Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
Error: exit status 1

Impacted by this bug? Give it a 👍.

milosjava avatar Sep 05 '24 22:09 milosjava

As workaround you can do python wrapper for component. Real problem caused in api-server + argoworkflow. There are mismatch between names.

sanchesoon avatar Sep 10 '24 20:09 sanchesoon

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 10 '24 07:11 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Dec 01 '24 07:12 github-actions[bot]

/reopen Still an issue.

lewmatcin avatar Dec 03 '24 12:12 lewmatcin

@lewmatcin: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen Still an issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Dec 03 '24 12:12 google-oss-prow[bot]

/reopen

milosjava avatar Dec 03 '24 15:12 milosjava

@milosjava: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Dec 03 '24 15:12 google-oss-prow[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 03 '25 07:02 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Feb 24 '25 07:02 github-actions[bot]

My team is testing with KFP version 2.5.0 & the KFP SDK 2.13 and are still seeing this issue.

I plan to start working on a solution to solve this for all components as I had encountered it a while ago during another PR.

zazulam avatar May 22 '25 16:05 zazulam

@zazulam, yes, I am testing with the same (KFP version 2.5.0 & the KFP SDK 2.13 ), and this is unfortunately still present. It's not very pleasant, and it's blocking our migration efforts. Please let me know if I can assist you in any way with this effort to resolve the issue.

milosjava avatar May 23 '25 13:05 milosjava

/assign @zazulam

zazulam avatar May 27 '25 14:05 zazulam

I faced the same issue and resolved it by explicitly matching the producer task name in the compiled pipeline YAML.

Kubeflow automatically converts underscores _ in Python task names to hyphens - in the final YAML (e.g., fetch_component becomes fetch-component).

If downstream components reference the task using its Python name (with _), they will fail with: cannot find producer task ....

To fix this, explicitly set the display name in the pipeline definition:

fetch_data = fetch_component(...).set_display_name("fetch-component")

TSK208 avatar Jul 03 '25 12:07 TSK208