feat(sdk): auto-detect and install Kubeflow SDK for component functions #12027
Description
This PR implements automatic Kubeflow SDK integration for KFP components as requested in #12027. It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.
Changes Made
1. Added Kubeflow Extras to setup.py
- Added
kubeflow = ['kubeflow']extras option - Included kubeflow in the
'all'extras bundle - Allows users to install with
pip install kfp[kubeflow]orpip install kfp[all]
2. Implemented AST-based Auto-detection
- Added
_detect_kubeflow_imports_in_function()incomponent_factory.py - Uses Abstract Syntax Tree parsing (
ast+inspect+textwrap.dedent) to detect kubeflow imports in component functions - Supports multiple import patterns:
import kubeflowimport kubeflow.<submodule>from kubeflow import <symbol>
3. Automatic Package Installation
- Modified
_get_packages_to_install_command(...)to auto-detect kubeflow usage - Automatically adds
'kubeflow'topackages_to_installwhen detected and not already specified - Respects explicit user-provided packages and version pins (no duplication)
- Recognizes kubeflow when specified via VCS URLs
- Fails closed: if source cannot be inspected or parsed, no auto-add occurs
4. Opt-out Control per Component
- Extended
@dsl.componentwithinstall_kubeflow_package: bool = TrueTrue(default): auto-add kubeflow if user code imports itFalse: never auto-add kubeflow for that component
5. Comprehensive Test Coverage
- Added test cases covering:
- All supported kubeflow import patterns and negative cases
- Behavior when source inspection fails or syntax is invalid
- Package parsing (versions, extras, VCS URLs) and duplicate avoidance
- Decorator integration ensuring kubeflow is only installed when needed
Usage
Install via extras:
pip install kfp[kubeflow]
pip install kfp[all]
Default auto-detection (no user change needed):
@dsl.component
def my_comp(...):
import kubeflow
...
# Kubeflow SDK is added to packages_to_install automatically
Opt-out:
@dsl.component(install_kubeflow_package=False)
def my_comp(...):
import kubeflow
...
# Kubeflow SDK will not be auto-added
Related Issues
Fixes #12027
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @Prateekbala. Thanks for your PR.
I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
🚫 This command cannot be processed. Only organization members or owners can use the commands.
@Prateekbala could you please add a sign off to your commit?
@Prateekbala could you please add a sign off to your commit? I’ve added the sign-off to my commit. Thank you for pointing that out.
I've implemented the requested changes
Please rebase so that there aren't any merge commits.
Please rebase so that there aren't any merge commits.
Done
It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.
@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.
We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach.
In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies.
It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.
@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.
Hi @chensun, I understand the concern around inferring dependencies. In the earlier discussion on #12027 and in the Kubeflow SDK design thread, @mprahl proposed using an AST-based check to detect kubeflow imports by providing a kubeflow extras option. The direction in that conversation was to implement this AST-based detection together with the extras entry as an initial solution, with the expectation that it could be revised when the broader Kubeflow SDK integration work evolves.
I'm wondering if we should revisit this given @chensun's concerns? Any other concerns from @kubeflow/wg-pipeline-leads about this approach or are we happy to move forward?
It adds the ability to automatically detect and install the Kubeflow SDK when components use kubeflow imports, while also providing explicit extras for opt-in installation.
@Prateekbala @mprahl I generally dislike over-smart behaviors like this. Kubeflow imports inside users' component code are no different from any other library imports, and it should be users' responsibilities specifying the dependencies they want to install.
We should stick to standard Python dependency management. Inferring dependencies from imports is problematic and rarely the right approach.
In practice, as the sdk evolves, there could be breaking changes, the version you install may not actually work with the user code, and there could be transient dependencies incompatible with user-specified dependencies--one of the reason we opted to install kfp with no dependencies.
@chensun Thanks for the review. You raise valid points regarding the risks of "magic" behavior and dependency management. I definitely agree that we want to avoid inferring dependencies for general libraries.
However, I believe the Kubeflow SDK warrants a special exception separate from standard third-party packages. Our goal is to encourage seamless usage of the wider Kubeflow ecosystem within components. Conceptually, this somewhat aligns with how we already pre-install the kfp SDK into the container at runtime; we are simply extending that convenience to the broader Kubeflow namespace.
To address your concerns about version drift and user control, I propose the following safeguards for this PR:
- Respecting Opt-Outs: If the user explicitly disables
kfppackage installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK. - Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
- Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
- CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
- Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.
To address your concerns about version drift and user control, I propose the following safeguards for this PR:
- Respecting Opt-Outs: If the user explicitly disables
kfppackage installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.- Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
- Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
- CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
- Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.
Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:
-
Kubeflow SDK version - What version should we actually pin to is it 1.9.0 ? Should it match a specific KFP version or use a range like >=1.9.0,<2.0?
-
Logging - Where should we log the auto-detection? At compile time when building the spec, or at runtime in the executor?
-
Default mode - Currently AUTO is the default. Given the concerns , should we start with SKIP as default instead and let users opt-in?
To address your concerns about version drift and user control, I propose the following safeguards for this PR:
- Respecting Opt-Outs: If the user explicitly disables
kfppackage installation in the component decorator, we should also disable the automatic detection/installation of the Kubeflow SDK.- Strict Version Pinning: We could pin the Kubeflow SDK version to an x.y version in setup of the KFP SDK. This ensures the injected SDK is always aligned with the runtime environment, mitigating the risk of breaking changes or incompatibility.
- Best-Effort Detection: We could keep the autodetect logic for kubeflow imports, but it will only trigger the installation of the pinned, compatible version defined above.
- CI Verification: Let's add a pipeline run in CI to explicitly test this autodetection and installation path, ensuring the pinned version installs successfully without conflict.
- Visibility: We should log in the executor that Kubeflow SDK usage was detected and it will be automatically installed. We can also provide instructions to disable it.
Hey @mprahl , I've addressed the review feedback and added some safeguards based on the concerns. A few things I need clarification on:
- Kubeflow SDK version - What version should we actually pin to is it 1.9.0 ? Should it match a specific KFP version or use a range like >=1.9.0,<2.0?
I think we can keep the range >=0.2.0,<0.3.0 for now and bump it every KFP release. @andreyvelich @kramaranya do you agree with that?
- Logging - Where should we log the auto-detection? At compile time when building the spec, or at runtime in the executor?
I was thinking the executor so the user can have debug information about which Kubeflow SDK was used at runtime.
- Default mode - Currently AUTO is the default. Given the concerns , should we start with SKIP as default instead and let users opt-in?
I think if the install_kfp_package argument is True default to auto. Otherwise let the user explicitly specify the behavior.
I think we can keep the range >=0.2.0,<0.3.0 for now and bump it every KFP release. @andreyvelich @kramaranya do you agree with that?
I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time
I think we can just keep >=0.2.0 and it should install the latest version, so we don't need to bump it every time
@kramaranya If we are planning to introduce breaking changes between minor releases, it might be better to stitch with <0.3 for now to not break users' pipelines after upgrade.
Thank you very much for your efforts @Prateekbala!
We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:
- The Concern: The KFP SDK version used to compile a pipeline does not necessarily match the active KFP deployment.
- The Risk: If we pin the Kubeflow SDK at the KFP SDK level, we risk installing a Kubeflow SDK that is incompatible with the actual Kubeflow deployment.
Alternative Proposal
Instead, we believe this should be handled as an API Server concern, not a KFP SDK concern. The system should ideally resolve the associated Kubeflow SDK version (using major.minor/x.y alignment) based on the deployed KFP version. This ensures we always use the SDKs compatible with what is actually installed.
Since this architectural change has significant repercussions, we need a member of the KFP community to propose a KEP (Kubeflow Enhancement Proposal) to define this path forward.
Thank you very much for your efforts @Prateekbala!
We discussed this PR during today's KFP community call. The maintainers have decided to put a pause on this effort due to a versioning concern:
Thanks for the update @mprahl! I’ll pause the PR for now. Happy to help and contribute more.