kfctl
kfctl copied to clipboard
kfctl is overly reliant on expensive E2E tests; slowing down development
kfctl is overly reliant on expensive E2E tests.
I'm seeing presubmits taking 50 minutes to run. Furthermore as these E2E tests become more comprehensive they inherently become more flaky.
I think we need to rethink our test strategy for kfctl to ensure velocity remains high.
-
For example, how can we develop a component test for kfctl that doesn't end up being a complete E2E test for kubeflow?
- e.g. Does it really make sense when testing changes to kfctl to test that all the Kubeflow applications are deployed and healthy?
/cc @nrchakradhar
One idea would be to
- Selectively trigger tests specific to plugins when the plugin changed
- Use purposely designed kustomize applications and KFDef files to test the bulk of kfctl functionality
The bulk of kfctl functionality is just about processing a KFDef to deploy a bunch of kustomize applications.
We shouldn't really need to use actual Kubeflow manifests or create clusters to test that.
We could create simple kustomize manifests that do really cheap and reliable operations; e.g. create a Job that just runs "echo Hello world".
That could easily run in the test cluster (where prow jobs run); as opposed to creating a new cluster.
Plugin authors (e.g. authors of gcp.go, aws.go, existing.go) could still trigger more expensive comprehensive tests selectively when those plugins are modified.
@swiftdiaries this is the issue tracking trying to come up with a better way of testing kfctl.
Should we punt this to post 1.0? I don't think anyone is working on this and I'm not sure we will have cycles to fix it before we want to release 1.0.
Also I think the direction we want to move is kfctl should mostly be focused on helping users build manifests and not applying/deleting them. apply/delete(#293) should mostly be synonymous with kubectl apply/delete.
So kfctl testing should mostly focus on generating the config files and validating them but not necessarily applying them and verifying they produce a working deployment.
So similar to #1016 kfctl tests should mostly be running kfctl build and validating the produced configs match the expected configs. This would eliminate a huge source of presubmit flakiness.