Central driver POC
Description of your changes:
POC for https://github.com/kubeflow/pipelines/pull/12023
Changes:
- I modified the Argo compiler in the API server — it now generates a workflow spec with the driver plugin instead of a container. The driver is now hosted as a server inside the agent.
- I built modified images for the API server (for compiling a new Argo workflow spec) and added the KFP driver server image (hosted by the executor plugin).
- Added a necessary sa/tokens and additional rules according to documentation
- built images from the brunch and pushed to docker.io
How to launch:
I built multi-layer container images on both Apple M-series (ARM64) and Linux/AMD64 platforms. If you’re using the same architecture, you can safely reuse the images from Docker Hub (ntny/kfp-driver:beta-poc & ntny/kfp-api-server:beta-poc). These images are already referenced in the manifests in this branch. If your architecture is different, you will need to build the Dockerfile and Dockerfile.driver yourself from this brunch and replace images to yours here and here before proceeding with the further instructions
I use & have prepeared a platform-agnostic env inside minikube (mono user)
- move to the root of the project and run:
kubectl apply -k ./manifests/kustomize/cluster-scoped-resources
- wait about 30 seconds and run
kubectl apply -k ./manifests/kustomize/env/platform-agnostic
Forward the UI port as usual:
```bash
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
I have tested this POC on the preinstalled [Tutorial] Data passing in Python components pipeline. Drivers are not created, and the agent is used instead (and removed after the pipeline has finished).
Please note: this is just a POC and not a production-ready solution.
Hi @ntny. Thanks for your PR.
I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
🚫 This command cannot be processed. Only organization members or owners can use the commands.
/hold
This is EPIC, @ntny! Can't wait to try it out.
/unhold
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @HumairAK @droctothorpe would you mind giving this a try? It should be pretty straightforward to run the cluster with the agent only without the driver by following the instructions above.
Hi! @nsingla I made intentional changes to the compiler, and manually updating all specs in test/compiled-workflow would be very time-consuming. I’ve already used the following code on my side to regenerate specs directly from the test using a special flag (similar to snapshot tests) and then review the diff manually. Do you have any concerns about this approach, given your experience with test code and test practices?
Hi! @nsingla I made intentional changes to the compiler, and manually updating all specs in test/compiled-workflow would be very time-consuming. I’ve already used the following code on my side to regenerate specs directly from the test using a special flag (similar to snapshot tests) and then review the diff manually. Do you have any concerns about this approach, given your experience with test code and test practices?
You don;t need to update it manually, you can run the compiler tests locally with flag:
ginkgo -v -- -updateCompiledFiles=true
this should update the workflows
/ok-to-test
Hey, @ntny . Unfortunately, I won't have bandwidth to validate it in the next two weeks but just wanted to let you know that it's on my radar and I will get to it as soon as I can. Maybe someone else will get to it before me. VERY excited about this. Kudos!
Hey, @ntny . Unfortunately, I won't have bandwidth to validate it in the next two weeks but just wanted to let you know that it's on my radar and I will get to it as soon as I can. Maybe someone else will get to it before me. VERY excited about this. Kudos!
Hi, thanks! Sure, absolutely no rush!