pipelines
pipelines copied to clipboard
[feature] Build and test V2 driver / launcher images against incoming PRs
Feature Area
/area backend /area samples
What feature would you like to see?
The V2 backend driver / launcher images being built and tested against incoming PRs through integration tests / etc.
What is the use case or pain point?
Assure stability of driver / launcher.
Is there a workaround currently?
Trust people to test driver / launcher locally.
More details
Currently, the kfp-cluster action, currently used by the workflows listed below, uses build-images.sh to build a set of images and push to the kind registry.
-
e2e-test.yml
-
kfp-kubernetes-execution-tests.yml
-
kfp-samples.yml
<-- My primary focus at the moment -
kubeflow-pipelines-integration-v2.yml
-
periodic.yml
-
sdk-execution.yml
-
upgrade-test.yml
The set of images which are built by build-images.sh
does not currently include the V2 driver and launcher.
Even if this is changed, there would still be additional work required to assure these built images would be used by the backend during testing. Namely, the backend has defaults for which images to use (see here) which normally point to gcr.io
locations. Work would need to be done to override these defaults so that during PR testing, the built images would be used instead of the ones deployed previously on gcr.io
.
Discussion of implementation
- Updating
build-image.sh
would likely be pretty straight forward. - The argo compiler accepts
V2_DRIVER_IMAGE
/V2_LAUNCHER_IMAGE
environment variables to override thegcr.io
defaults (configured via thedeployment.apps/ml-pipeline
deployment). @hbelmiro has suggested maybe using a Kustomize layer for updating these during testing.
What about releases?
Although it makes sense to build driver / launcher images and test them during the PRs it may make sense to NOT override the V2_DRIVER_IMAGE
/ V2_LAUNCHER_IMAGE
defaults and test against the gcr.io
deployments when validating releases. Since users will be unlikely to override these values and use gcr.io
it is reasonable to test in that configuration.
I am not aware of the extent to which kfp-samples.yml
(or other workflows consuming the kfp-cluster
action) are executed during release processes. Please let me know if others have more info on this :)
Related slack thread: https://cloud-native.slack.com/archives/C073N7BMLB1/p1727104197895549
Love this idea? Give it a 👍.