kfctl icon indicating copy to clipboard operation
kfctl copied to clipboard

kfctl is overly reliant on expensive E2E tests; slowing down development

Open jlewi opened this issue 6 years ago • 5 comments
trafficstars

kfctl is overly reliant on expensive E2E tests.

I'm seeing presubmits taking 50 minutes to run. Furthermore as these E2E tests become more comprehensive they inherently become more flaky.

I think we need to rethink our test strategy for kfctl to ensure velocity remains high.

  • For example, how can we develop a component test for kfctl that doesn't end up being a complete E2E test for kubeflow?

    • e.g. Does it really make sense when testing changes to kfctl to test that all the Kubeflow applications are deployed and healthy?

jlewi avatar Oct 16 '19 02:10 jlewi

/cc @nrchakradhar

nrchakradhar avatar Oct 16 '19 03:10 nrchakradhar

One idea would be to

  • Selectively trigger tests specific to plugins when the plugin changed
  • Use purposely designed kustomize applications and KFDef files to test the bulk of kfctl functionality

The bulk of kfctl functionality is just about processing a KFDef to deploy a bunch of kustomize applications.

We shouldn't really need to use actual Kubeflow manifests or create clusters to test that.

We could create simple kustomize manifests that do really cheap and reliable operations; e.g. create a Job that just runs "echo Hello world".

That could easily run in the test cluster (where prow jobs run); as opposed to creating a new cluster.

Plugin authors (e.g. authors of gcp.go, aws.go, existing.go) could still trigger more expensive comprehensive tests selectively when those plugins are modified.

jlewi avatar Oct 16 '19 12:10 jlewi

@swiftdiaries this is the issue tracking trying to come up with a better way of testing kfctl.

jlewi avatar Nov 06 '19 12:11 jlewi

Should we punt this to post 1.0? I don't think anyone is working on this and I'm not sure we will have cycles to fix it before we want to release 1.0.

jlewi avatar Jan 06 '20 14:01 jlewi

Also I think the direction we want to move is kfctl should mostly be focused on helping users build manifests and not applying/deleting them. apply/delete(#293) should mostly be synonymous with kubectl apply/delete.

So kfctl testing should mostly focus on generating the config files and validating them but not necessarily applying them and verifying they produce a working deployment.

So similar to #1016 kfctl tests should mostly be running kfctl build and validating the produced configs match the expected configs. This would eliminate a huge source of presubmit flakiness.

jlewi avatar Apr 09 '20 15:04 jlewi