Using set_display_name results in ‘Cannot Find Producer Task’ error in Kubeflow Pipelines
Environment
Kubeflow version: 1.9 KFP SDK version: 2.8.0
Backend: Argo
Steps to reproduce
- Define two components (addition and divide) and create a pipeline (hello_pipeline) that chains them together.
- Use the set_display_name function to set a display name for the first component (addition).
- Compile the pipeline and run it. Pipeline will fail cause the output from the first component is not available to the second component (divide). Please check the error log at the end of this issue.
Here is the Python code that reproduces the issue:
from kfp import dsl
@dsl.component
def addition(a: float, b: float) -> float:
print("hi")
return a + b
@dsl.component
def divide(a: float, b: float) -> float:
print("hi")
return a / b
@dsl.pipeline
def hello_pipeline(a: float = 1, b: float = 2, c: float = 3) -> None:
total = addition(a=a, b=b).set_display_name('total')
fraction = divide(a=total.output, b=c)
from kfp import compiler
compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')
Expected result
The pipeline should run successfully with the set_display_name method used.
Actual result
Pipeline fails in divide component with error message:
I0905 21:49:34.529181 21 main.go:108] input ComponentSpec:{
"executorLabel": "exec-divide",
"inputDefinitions": {
"parameters": {
"a": {
"parameterType": "NUMBER_DOUBLE"
},
"b": {
"parameterType": "NUMBER_DOUBLE"
}
}
},
"outputDefinitions": {
"parameters": {
"Output": {
"parameterType": "NUMBER_DOUBLE"
}
}
}
}
I0905 21:49:34.530368 21 main.go:115] input TaskSpec:{
"cachingOptions": {
"enableCache": true
},
"componentRef": {
"name": "comp-divide"
},
"dependentTasks": [
"addition"
],
"inputs": {
"parameters": {
"a": {
"taskOutputParameter": {
"outputParameterKey": "Output",
"producerTask": "addition"
}
},
"b": {
"componentInputParameter": "c"
}
}
},
"taskInfo": {
"name": "divide"
}
}
I0905 21:49:34.531025 21 main.go:121] input ContainerSpec:{
"args": [
"--executor_input",
"{{$}}",
"--function_to_execute",
"divide"
],
"command": [
"sh",
"-c",
"\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.8.0' '--no-deps' 'typing-extensions\u003e=3.7.4,\u003c5; python_version\u003c\"3.9\"' \u0026\u0026 \"$0\" \"$@\"\n",
"sh",
"-ec",
"program_path=$(mktemp -d)\n\nprintf \"%s\" \"$0\" \u003e \"$program_path/ephemeral_component.py\"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path \"$program_path/ephemeral_component.py\" \"$@\"\n",
"\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef divide(a: float, b: float) -\u003e float:\n print(\"hi\")\n return a / b\n\n"
],
"image": "python:3.8"
}
I0905 21:49:34.531735 21 cache.go:139] Cannot detect ml-pipeline in the same namespace, default to ml-pipeline.kubeflow:8887 as KFP endpoint.
I0905 21:49:34.531764 21 cache.go:116] Connecting to cache endpoint ml-pipeline.kubeflow:8887
I0905 21:49:34.607730 21 client.go:302] Pipeline Context: id:17447 name:"hello-pipeline" type_id:26 type:"system.Pipeline" create_time_since_epoch:1725572300413 last_update_time_since_epoch:1725572300413
I0905 21:49:34.667931 21 client.go:311] Pipeline Run Context: id:17449 name:"ce7a10c3-b5e4-4222-ae34-919cb179134c" type_id:27 type:"system.PipelineRun" custom_properties:{key:"namespace" value:{string_value:"milos-grubjesic"}} custom_properties:{key:"pipeline_root" value:{string_value:"minio://kubeflow-content/v2/artifacts/hello-pipeline/ce7a10c3-b5e4-4222-ae34-919cb179134c"}} custom_properties:{key:"resource_name" value:{string_value:"run-resource"}} custom_properties:{key:"store_session_info" value:{string_value:"{\"Provider\":\"minio\",\"Params\":{\"accessKeyKey\":\"accesskey\",\"disableSSL\":\"true\",\"endpoint\":\"minio-service.kubeflow:9000\",\"fromEnv\":\"false\",\"region\":\"minio\",\"secretKeyKey\":\"secretkey\",\"secretName\":\"mlpipeline-minio-artifact\"}}"}} create_time_since_epoch:1725572932246 last_update_time_since_epoch:1725572932246
I0905 21:49:35.060625 21 driver.go:252] parent DAG: id:461494 name:"run/ce7a10c3-b5e4-4222-ae34-919cb179134c" type_id:239 type:"system.DAGExecution" last_known_state:RUNNING custom_properties:{key:"display_name" value:{string_value:""}} custom_properties:{key:"inputs" value:{struct_value:{fields:{key:"a" value:{number_value:1}} fields:{key:"b" value:{number_value:2}} fields:{key:"c" value:{number_value:3}}}}} custom_properties:{key:"task_name" value:{string_value:""}} create_time_since_epoch:1725572932561 last_update_time_since_epoch:1725572932561
I0905 21:49:35.176022 21 driver.go:926] parent DAG input parameters: map[a:number_value:1 b:number_value:2 c:number_value:3], artifacts: map[]
F0905 21:49:35.233819 21 main.go:79] KFP driver: driver.Container(pipelineName=hello-pipeline, runID=ce7a10c3-b5e4-4222-ae34-919cb179134c, task="divide", component="comp-divide", dagExecutionID=461494, componentSpec) failed: failed to resolve inputs: resolving input parameter a with spec task_output_parameter:{producer_task:"addition" output_parameter_key:"Output"}: cannot find producer task "addition"
time="2024-09-05T21:49:35.435Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-09-05T21:49:35.435Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-09-05T21:49:35.435Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-09-05T21:49:35.435Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
Error: exit status 1
Impacted by this bug? Give it a 👍.
As workaround you can do python wrapper for component. Real problem caused in api-server + argoworkflow. There are mismatch between names.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen Still an issue.
@lewmatcin: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen Still an issue.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@milosjava: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
My team is testing with KFP version 2.5.0 & the KFP SDK 2.13 and are still seeing this issue.
I plan to start working on a solution to solve this for all components as I had encountered it a while ago during another PR.
@zazulam, yes, I am testing with the same (KFP version 2.5.0 & the KFP SDK 2.13 ), and this is unfortunately still present. It's not very pleasant, and it's blocking our migration efforts. Please let me know if I can assist you in any way with this effort to resolve the issue.
/assign @zazulam
I faced the same issue and resolved it by explicitly matching the producer task name in the compiled pipeline YAML.
Kubeflow automatically converts underscores _ in Python task names to hyphens - in the final YAML (e.g., fetch_component becomes fetch-component).
If downstream components reference the task using its Python name (with _), they will fail with:
cannot find producer task ....
To fix this, explicitly set the display name in the pipeline definition:
fetch_data = fetch_component(...).set_display_name("fetch-component")