pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[backend] Dataset Artifact unable to be found when `set_display_name` is set

Open AnteWall opened this issue 6 months ago • 2 comments

Environment

  • How did you deploy Kubeflow Pipelines (KFP)? We are using deployKF on a custom k8s cluster.

  • KFP version: 2.1.0

  • KFP SDK version: v2.8.0

Steps to reproduce

When setting set_display_name for a component my pipelines are not starting next component if the pipeline has an output artifact and .set_display_name(...) set to it.

it will save the pipeline artifact in object storage like s3://{bucket}/v2/artifacts/{namespace}/{pipeline}/{run_id}/My Comp/ But then when it tries to get the artifact in the next pipe its trying to get s3://{bucket}/v2/artifacts/{namespace}/{pipeline}/{run_id}/comp-1/ (which is the "producerTask")`

When i remove the set_display_name the yaml diff is only taskInfo.name, so I guess its incorrectly translating the task names to Argo Workflows.

Expected result

Expect to be able to set set_display_name and task is correctly saving and picking up artifacts.

Materials and Reference

NOT WORKING PIPELINE

@dsl.component(base_image="python:3.11")
def data_get(dataset: OutputPath("dataset")):
    with open(dataset, "w") as f:
        f.write('{"data": "Hello, world!"}')


@dsl.component()
def log_metrics(input: InputPath("dataset"), metrics: Output[Markdown]):
    import json

    with open(input, "r") as f:
        data = json.loads(f.read())
        with open(metrics.path, "w") as f:
            f.write(f"# Metrics\n\n- data: {data['data']}")


@dsl.pipeline(
    name="pipeline-5",
    description="pipeline-5 description",
)
def pipeline_5():
    data = data_get().set_display_name("Create Data")
    preprocess_op = log_metrics(input=data.outputs["dataset"]).set_display_name(
        "Log Metrics"
    )

NOT WOKRING (YAML)

# PIPELINE DEFINITION
# Name: pipeline-5
# Description: pipeline-5 description
components:
  comp-data-get:
    executorLabel: exec-data-get
    outputDefinitions:
      artifacts:
        dataset:
          artifactType:
            schemaTitle: system.Dataset
            schemaVersion: 0.0.1
  comp-log-metrics:
    executorLabel: exec-log-metrics
    inputDefinitions:
      artifacts:
        input:
          artifactType:
            schemaTitle: system.Dataset
            schemaVersion: 0.0.1
    outputDefinitions:
      artifacts:
        metrics:
          artifactType:
            schemaTitle: system.Markdown
            schemaVersion: 0.0.1
deploymentSpec:
  executors:
    exec-data-get:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - data_get
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef data_get(dataset: OutputPath(\"dataset\")):\n    with open(dataset,\
          \ \"w\") as f:\n        f.write('{\"data\": \"Hello, world!\"}')\n\n"
        image: python:3.11
    exec-log-metrics:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - log_metrics
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef log_metrics(input: InputPath(\"dataset\"), metrics: Output[Markdown]):\n\
          \    import json\n\n    with open(input, \"r\") as f:\n        data = json.loads(f.read())\n\
          \        with open(metrics.path, \"w\") as f:\n            f.write(f\"#\
          \ Metrics\\n\\n- data: {data['data']}\")\n\n"
        image: python:3.7
pipelineInfo:
  description: pipeline-5 description
  name: pipeline-5
root:
  dag:
    tasks:
      data-get:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-data-get
        taskInfo:
          name: Create Data
      log-metrics:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-log-metrics
        dependentTasks:
        - data-get
        inputs:
          artifacts:
            input:
              taskOutputArtifact:
                outputArtifactKey: dataset
                producerTask: data-get
        taskInfo:
          name: Log Metrics
schemaVersion: 2.1.0
sdkVersion: kfp-2.7.0

Impacted by this bug? Give it a 👍.

AnteWall avatar Aug 16 '24 07:08 AnteWall