pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

feat(sdk): auto-detect and install Kubeflow SDK for component functions #12027

Open Prateekbala opened this issue 2 months ago • 18 comments

Description

This PR implements automatic Kubeflow SDK integration for KFP components as requested in #12027. It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

Changes Made

1. Added Kubeflow Extras to setup.py

  • Added kubeflow = ['kubeflow'] extras option
  • Included kubeflow in the 'all' extras bundle
  • Allows users to install with pip install kfp[kubeflow] or pip install kfp[all]

2. Implemented AST-based Auto-detection

  • Added _detect_kubeflow_imports_in_function() in component_factory.py
  • Uses Abstract Syntax Tree parsing (ast + inspect + textwrap.dedent) to detect kubeflow imports in component functions
  • Supports multiple import patterns:
    • import kubeflow
    • import kubeflow.<submodule>
    • from kubeflow import <symbol>

3. Automatic Package Installation

  • Modified _get_packages_to_install_command(...) to auto-detect kubeflow usage
  • Automatically adds 'kubeflow' to packages_to_install when detected and not already specified
  • Respects explicit user-provided packages and version pins (no duplication)
  • Recognizes kubeflow when specified via VCS URLs
  • Fails closed: if source cannot be inspected or parsed, no auto-add occurs

4. Opt-out Control per Component

  • Extended @dsl.component with install_kubeflow_package: bool = True
    • True (default): auto-add kubeflow if user code imports it
    • False: never auto-add kubeflow for that component

5. Comprehensive Test Coverage

  • Added test cases covering:
    • All supported kubeflow import patterns and negative cases
    • Behavior when source inspection fails or syntax is invalid
    • Package parsing (versions, extras, VCS URLs) and duplicate avoidance
    • Decorator integration ensuring kubeflow is only installed when needed

Usage

Install via extras:

pip install kfp[kubeflow]
pip install kfp[all]

Default auto-detection (no user change needed):

@dsl.component
def my_comp(...):
    import kubeflow
    ...
# Kubeflow SDK is added to packages_to_install automatically

Opt-out:

@dsl.component(install_kubeflow_package=False)
def my_comp(...):
    import kubeflow
    ...
# Kubeflow SDK will not be auto-added

Related Issues

Fixes #12027

Prateekbala avatar Oct 15 '25 15:10 Prateekbala

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Oct 15 '25 15:10 google-oss-prow[bot]

Hi @Prateekbala. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Oct 15 '25 15:10 google-oss-prow[bot]

🚫 This command cannot be processed. Only organization members or owners can use the commands.

github-actions[bot] avatar Oct 15 '25 15:10 github-actions[bot]

@Prateekbala could you please add a sign off to your commit?

mprahl avatar Oct 15 '25 17:10 mprahl

@Prateekbala could you please add a sign off to your commit? I’ve added the sign-off to my commit. Thank you for pointing that out.

Prateekbala avatar Oct 16 '25 06:10 Prateekbala

I've implemented the requested changes

Prateekbala avatar Nov 02 '25 20:11 Prateekbala

Please rebase so that there aren't any merge commits.

mprahl avatar Nov 05 '25 15:11 mprahl

Please rebase so that there aren't any merge commits.

Done

Prateekbala avatar Nov 06 '25 14:11 Prateekbala

It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.

We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach.

In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies.

chensun avatar Nov 13 '25 21:11 chensun

It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.

Hi @chensun, I understand the concern around inferring dependencies. In the earlier discussion on #12027 and in the Kubeflow SDK design thread, @mprahl proposed using an AST-based check to detect kubeflow imports by providing a kubeflow extras option. The direction in that conversation was to implement this AST-based detection together with the extras entry as an initial solution, with the expectation that it could be revised when the broader Kubeflow SDK integration work evolves.

Prateekbala avatar Nov 15 '25 17:11 Prateekbala

I'm wondering if we should revisit this given @chensun's concerns? Any other concerns from @kubeflow/wg-pipeline-leads about this approach or are we happy to move forward?

kramaranya avatar Nov 17 '25 20:11 kramaranya

It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.

@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.

We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach.

In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies.

@chensun Thanks for the review. You raise valid points regarding the risks of "magic" behavior and dependency management. I definitely agree that we want to avoid inferring dependencies for general libraries.

However, I believe the Kubeflow SDK warrants a special exception separate from standard third-party packages. Our goal is to encourage seamless usage of the wider Kubeflow ecosystem within components. Conceptually, this somewhat aligns with how we already pre-install the kfp SDK into the container at runtime; we are simply extending that convenience to the broader Kubeflow namespace.

To address your concerns about version drift and user control, I propose the following safeguards for this PR:

  • Respecting Opt-Outs: If the user explicitly disables kfp package installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.
  • Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
  • Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
  • CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
  • Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.

mprahl avatar Nov 18 '25 21:11 mprahl

To address your concerns about version drift and user control, I propose the following safeguards for this PR:

  • Respecting Opt-Outs: If the user explicitly disables kfp package installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.
  • Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
  • Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
  • CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
  • Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.

Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:

  1. Kubeflow SDK version - What version should we actually pin to is it 1.9.0 ? Should it match a specific KFP version or use a range like >=1.9.0,<2.0?

  2. Logging - Where should we log the auto-detection? At compile time when building the spec, or at runtime in the executor?

  3. Default mode - Currently AUTO is the default. Given the concerns , should we start with SKIP as default instead and let users opt-in?

Prateekbala avatar Nov 19 '25 12:11 Prateekbala

To address your concerns about version drift and user control, I propose the following safeguards for this PR:

  • Respecting Opt-Outs: If the user explicitly disables kfp package installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.
  • Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
  • Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
  • CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
  • Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.

Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:

  1. Kubeflow SDK version - What version should we actually pin to is it 1.9.0 ? Should it match a specific KFP version or use a range like >=1.9.0,<2.0?

I think we can keep the range >=0.2.0,<0.3.0 for now and bump it every KFP release. @andreyvelich @kramaranya do you agree with that?

  1. Logging - Where should we log the auto-detection? At compile time when building the spec, or at runtime in the executor?

I was thinking the executor so the user can have debug information about which Kubeflow SDK was used at runtime.

  1. Default mode - Currently AUTO is the default. Given the concerns , should we start with SKIP as default instead and let users opt-in?

I think if the install_kfp_package argument is True default to auto. Otherwise let the user explicitly specify the behavior.

mprahl avatar Nov 19 '25 14:11 mprahl

I think we can keep the range >=0.2.0,<0.3.0 for now and bump it every KFP release. @andreyvelich @kramaranya do you agree with that?

I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time

kramaranya avatar Nov 19 '25 16:11 kramaranya

I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time

@kramaranya If we are planning to introduce breaking changes between minor releases, it might be better to stitch with <0.3 for now to not break users' pipelines after upgrade.

andreyvelich avatar Nov 19 '25 18:11 andreyvelich

Thank you very much for your efforts @Prateekbala!

We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:

  • The Concern: The KFP SDK version used to compile a pipeline does not necessarily match the active KFP deployment.
  • The Risk: If we pin the Kubeflow SDK at the KFP SDK level, we risk installing a Kubeflow SDK that is incompatible with the actual Kubeflow deployment.

Alternative Proposal

Instead, we believe this should be handled as an API Server concern, not a KFP SDK concern. The system should ideally resolve the associated Kubeflow SDK version (using major.minor/x.y alignment) based on the deployed KFP version. This ensures we always use the SDKs compatible with what is actually installed.

Since this architectural change has significant repercussions, we need a member of the KFP community to propose a KEP (Kubeflow Enhancement Proposal) to define this path forward.

mprahl avatar Nov 19 '25 20:11 mprahl

Thank you very much for your efforts @Prateekbala!

We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:

Thanks for the update @mprahl! I’ll pause the PR for now. Happy to help and contribute more.

Prateekbala avatar Nov 19 '25 20:11 Prateekbala