pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

Central driver POC

Open ntny opened this issue 3 months ago • 12 comments

Description of your changes:

POC for https://github.com/kubeflow/pipelines/pull/12023

Changes:

  • I modified the Argo compiler in the API server — it now generates a workflow spec with the driver plugin instead of a container. The driver is now hosted as a server inside the agent.
  • I built modified images for the API server (for compiling a new Argo workflow spec) and added the KFP driver server image (hosted by the executor plugin).
  • Added a necessary sa/tokens and additional rules according to documentation
  • built images from the brunch and pushed to docker.io

How to launch:

I built multi-layer container images on both Apple M-series (ARM64) and Linux/AMD64 platforms. If you’re using the same architecture, you can safely reuse the images from Docker Hub (ntny/kfp-driver:beta-poc & ntny/kfp-api-server:beta-poc). These images are already referenced in the manifests in this branch. If your architecture is different, you will need to build the Dockerfile and Dockerfile.driver yourself from this brunch and replace images to yours here and here before proceeding with the further instructions

I use & have prepeared a platform-agnostic env inside minikube (mono user)

  • move to the root of the project and run:
kubectl apply -k ./manifests/kustomize/cluster-scoped-resources
  • wait about 30 seconds and run
kubectl apply -k ./manifests/kustomize/env/platform-agnostic 


Forward the UI port as usual: 
```bash  
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

I have tested this POC on the preinstalled [Tutorial] Data passing in Python components pipeline. Drivers are not created, and the agent is used instead (and removed after the pipeline has finished). Снимок экрана 2025-09-25 в 12 25 03

Please note: this is just a POC and not a production-ready solution.

ntny avatar Sep 22 '25 18:09 ntny

Hi @ntny. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Sep 22 '25 18:09 google-oss-prow[bot]

🚫 This command cannot be processed. Only organization members or owners can use the commands.

github-actions[bot] avatar Sep 22 '25 18:09 github-actions[bot]

/hold

ntny avatar Sep 22 '25 20:09 ntny

This is EPIC, @ntny! Can't wait to try it out.

droctothorpe avatar Sep 25 '25 02:09 droctothorpe

/unhold

ntny avatar Sep 27 '25 13:09 ntny

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Sep 30 '25 16:09 google-oss-prow[bot]

Hi @HumairAK @droctothorpe would you mind giving this a try? It should be pretty straightforward to run the cluster with the agent only without the driver by following the instructions above.

ntny avatar Sep 30 '25 16:09 ntny

Hi! @nsingla I made intentional changes to the compiler, and manually updating all specs in test/compiled-workflow would be very time-consuming. I’ve already used the following code on my side to regenerate specs directly from the test using a special flag (similar to snapshot tests) and then review the diff manually. Do you have any concerns about this approach, given your experience with test code and test practices?

ntny avatar Sep 30 '25 16:09 ntny

Hi! @nsingla I made intentional changes to the compiler, and manually updating all specs in test/compiled-workflow would be very time-consuming. I’ve already used the following code on my side to regenerate specs directly from the test using a special flag (similar to snapshot tests) and then review the diff manually. Do you have any concerns about this approach, given your experience with test code and test practices?

You don;t need to update it manually, you can run the compiler tests locally with flag: ginkgo -v -- -updateCompiledFiles=true this should update the workflows

nsingla avatar Sep 30 '25 18:09 nsingla

/ok-to-test

zazulam avatar Oct 01 '25 18:10 zazulam

Hey, @ntny . Unfortunately, I won't have bandwidth to validate it in the next two weeks but just wanted to let you know that it's on my radar and I will get to it as soon as I can. Maybe someone else will get to it before me. VERY excited about this. Kudos!

droctothorpe avatar Oct 05 '25 01:10 droctothorpe

Hey, @ntny . Unfortunately, I won't have bandwidth to validate it in the next two weeks but just wanted to let you know that it's on my radar and I will get to it as soon as I can. Maybe someone else will get to it before me. VERY excited about this. Kudos!

Hi, thanks! Sure, absolutely no rush!

ntny avatar Oct 08 '25 19:10 ntny