pipelines
pipelines copied to clipboard
[backend] Dataset Artifact unable to be found when `set_display_name` is set
Environment
-
How did you deploy Kubeflow Pipelines (KFP)? We are using
deployKF
on a custom k8s cluster. -
KFP version: 2.1.0
-
KFP SDK version: v2.8.0
Steps to reproduce
When setting set_display_name
for a component my pipelines are not starting next component if the pipeline has an output artifact and .set_display_name(...) set to it.
it will save the pipeline artifact in object storage like s3://{bucket}/v2/artifacts/{namespace}/{pipeline}/{run_id}/My Comp/
But then when it tries to get the artifact in the next pipe its trying to get s3://{bucket}/v2/artifacts/{namespace}/{pipeline}/{run_id}/comp-1/
(which is the "producerTask")`
When i remove the set_display_name
the yaml diff is only taskInfo.name
, so I guess its incorrectly translating the task names to Argo Workflows.
Expected result
Expect to be able to set set_display_name
and task is correctly saving and picking up artifacts.
Materials and Reference
NOT WORKING PIPELINE
@dsl.component(base_image="python:3.11")
def data_get(dataset: OutputPath("dataset")):
with open(dataset, "w") as f:
f.write('{"data": "Hello, world!"}')
@dsl.component()
def log_metrics(input: InputPath("dataset"), metrics: Output[Markdown]):
import json
with open(input, "r") as f:
data = json.loads(f.read())
with open(metrics.path, "w") as f:
f.write(f"# Metrics\n\n- data: {data['data']}")
@dsl.pipeline(
name="pipeline-5",
description="pipeline-5 description",
)
def pipeline_5():
data = data_get().set_display_name("Create Data")
preprocess_op = log_metrics(input=data.outputs["dataset"]).set_display_name(
"Log Metrics"
)
NOT WOKRING (YAML)
# PIPELINE DEFINITION
# Name: pipeline-5
# Description: pipeline-5 description
components:
comp-data-get:
executorLabel: exec-data-get
outputDefinitions:
artifacts:
dataset:
artifactType:
schemaTitle: system.Dataset
schemaVersion: 0.0.1
comp-log-metrics:
executorLabel: exec-log-metrics
inputDefinitions:
artifacts:
input:
artifactType:
schemaTitle: system.Dataset
schemaVersion: 0.0.1
outputDefinitions:
artifacts:
metrics:
artifactType:
schemaTitle: system.Markdown
schemaVersion: 0.0.1
deploymentSpec:
executors:
exec-data-get:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- data_get
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef data_get(dataset: OutputPath(\"dataset\")):\n with open(dataset,\
\ \"w\") as f:\n f.write('{\"data\": \"Hello, world!\"}')\n\n"
image: python:3.11
exec-log-metrics:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- log_metrics
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef log_metrics(input: InputPath(\"dataset\"), metrics: Output[Markdown]):\n\
\ import json\n\n with open(input, \"r\") as f:\n data = json.loads(f.read())\n\
\ with open(metrics.path, \"w\") as f:\n f.write(f\"#\
\ Metrics\\n\\n- data: {data['data']}\")\n\n"
image: python:3.7
pipelineInfo:
description: pipeline-5 description
name: pipeline-5
root:
dag:
tasks:
data-get:
cachingOptions:
enableCache: true
componentRef:
name: comp-data-get
taskInfo:
name: Create Data
log-metrics:
cachingOptions:
enableCache: true
componentRef:
name: comp-log-metrics
dependentTasks:
- data-get
inputs:
artifacts:
input:
taskOutputArtifact:
outputArtifactKey: dataset
producerTask: data-get
taskInfo:
name: Log Metrics
schemaVersion: 2.1.0
sdkVersion: kfp-2.7.0
Impacted by this bug? Give it a 👍.