pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[backend] pipeline fails to execute when passing optional parameter with no default value and no user-supplied value

Open ishaan-mehta opened this issue 10 months ago • 4 comments

When you create an optional pipeline parameter with a default value of None with the KFP Python SDK, and then you create a run where you don't pass a value, any components where that parameter is passed will fail to execute.

So for example, if you create a pipeline with foo as an input parameter of type Optional[str] and set the default value (in the function) as None, when you pass foo to a component, that component will fail to execute with these errors in the deploykf/kubeflow-pipelines/kfp-driver:2.1.0-deploykf.0 container:

│ I0224 21:16:18.566977      20 cache.go:116] Connecting to cache endpoint ml-pipeline.kubeflow.svc.cluster.local:8887                                                                 │      aks-stageailyr-24136615-vmss000004      3h58m       │
│ I0224 21:16:18.582757      20 client.go:290] Pipeline Context: id:10 name:"foo-pipeline" type_id:11 type:"system.Pipeline" create_time_since_epoch:1740431765635 last_update_time_si │      aks-devdjdru-24119009-vmss000001        4h5m        │
│ I0224 21:16:18.592025      20 client.go:298] Pipeline Run Context: id:11 name:"93d1bba6-92c2-42ad-8102-c484ccd90c6b" type_id:12 type:"system.PipelineRun" custom_properties:{key:"na │      aks-devdjdru-24119009-vmss000002        3h55m       │
│ I0224 21:16:18.635209      20 driver.go:239] parent DAG: id:15 name:"run/93d1bba6-92c2-42ad-8102-c484ccd90c6b" type_id:13 type:"system.DAGExecution" last_known_state:RUNNING custom │      aks-stageailyr-24136615-vmss000004      3h59m       │
│ I0224 21:16:18.653276      20 driver.go:870] parent DAG input parameters: map[], artifacts: map[]                                                                                    │                                                          │
│ F0224 21:16:18.653324      20 main.go:79] KFP driver: driver.Container(pipelineName=foo-pipeline, runID=93d1bba6-92c2-42ad-8102-c484ccd90c6b, task="print-foo", component="comp-prin │                                                          │
│ time="2025-02-24T21:16:19.518Z" level=info msg="sub-process exited" argo=true error="<nil>"                                                                                          │                                                          │
│ time="2025-02-24T21:16:19.518Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"    │                                                          │
│ time="2025-02-24T21:16:19.518Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"  │                                                          │
│ time="2025-02-24T21:16:19.518Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"              │                                                          │
│ Error: exit status 1                                                                                                                                                                 │                                                          │
│ stream closed EOF for team-1/foo-pipeline-nhxzc-2689883979 (main)   

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?

deployKF v0.1.5

  • KFP version:

2.1.0-deploykf.0

  • KFP SDK version:
kfp                       2.7.0
kfp-pipeline-spec         0.3.0
kfp-server-api            2.0.5

Steps to reproduce

from kfp import dsl
from typing import Optional

@dsl.component(
    base_image="python:3.11",
)
def print_foo(
    foo: Optional[str] = None
):        
    print(foo)

    
@dsl.pipeline
def foo_pipeline(
    foo: Optional[str] = None
):
    print_foo_task = print_foo(
        foo=foo
    )

from kfp.client import Client

client = Client()

run1 = client.create_run_from_pipeline_func(
    foo_pipeline,
    arguments={},
    enable_caching=False,
)

Expected result

The pipeline should either:

  1. Execute successfully, passing None for foo from foo_pipeline to print_foo. This can be the behavior even if we maintain that a Python default value of None is not converted to a parameter defaultValue to the pipeline IR. This can just be the Python-specific default behavior when passing an optional pipeline parameter without a defaultValue to a component. In other words, whenever you pass an optional pipeline parameter to a component and it has no defaultValue, the component function should just receive a None for that function parameter unless the user passed their own value.
  2. If optional parameters of pipelines (such as foo) are expected to crash the component execution when not given a default or user-supplied value, and Python Nones should not be mapped to a KFP defaultValue (same as we have currently), we should do the following: a. Show an warning at compile time about the invalid default value (e.g., Warning: You should not assign a default value of `None` to a `string` parameter. This will be treated as if there is no default value. Thus, when this pipeline is run, if no user-supplied value is passed to the pipeline, then any component using this pipeline parameter will crash.) b. Show a clearer error at run time explaining that (1) the None/non-existent default value of the pipeline parameter + (2) no user-supplied value for the parameter when creating the run + (3) passing the parameter to a component without a value was the reason for the failed component execution. The current error message is very unclear for anyone trying to determine what went wrong.

Materials and Reference


Impacted by this bug? Give it a 👍.

ishaan-mehta avatar Feb 24 '25 22:02 ishaan-mehta

@ishaan-mehta I believe this is fixed by #11788 and will be included in KFP 2.5.

mprahl avatar Apr 14 '25 19:04 mprahl

Hi @mprahl, thanks for the fix and the update!

Just out of curiosity, which path was taken? Are we simply passing any non-specified optional parameters as None (assuming they don't have a defaultValue set in the pipeline spec) to the component?

(I did briefly read through your PR changes but just wanted to make sure I am understanding correctly so that it is documented here.)

ishaan-mehta avatar Apr 15 '25 04:04 ishaan-mehta

Hi @mprahl, thanks for the fix and the update!

Just out of curiosity, which path was taken? Are we simply passing any non-specified optional parameters as None (assuming they don't have a defaultValue set in the pipeline spec) to the component?

(I did briefly read through your PR changes but just wanted to make sure I am understanding correctly so that it is documented here.)

@ishaan-mehta that was my original intuition but after reading the code, it seems the intended behavior of the existing code was to let components handle defaults. After the fix, any optional pipeline input parameters not specified are just omitted when passed to a component. So if the input parameter on the component is optional, then the default value at the component level is used. If the input parameter on the component is required, then there is an error indicating the parameter is missing.

mprahl avatar Apr 17 '25 18:04 mprahl

@ishaan-mehta this was released in KFP 2.5. Could you please try to verify if it resolves your issue?

mprahl avatar Apr 29 '25 20:04 mprahl