cluster-api-provider-bringyourownhost Investigate how fast can we get all the e2e tests to run

Describe the solution you'd like At the moment, running all the e2e tests takes 30ish mins. Can we investigate what is the bottleneck, and if it is possible to speed this up?

Nov 26 '21 05:11 jamiemonserrate

This is where all our e2e tests live - https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/tree/main/test/e2e

This issue involves doing some research on what steps in the tests are consuming a lot of time and if there are ways to mitigate this.

Feb 04 '22 04:02 anusha94

Good to start with the quickstart tests - https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/blob/main/test/e2e/e2e_test.go

[ ] identify how much time each step takes (test setup, creating management cluster, docker hosts, apply cluster template, log collection, teardown)
[ ] identify possible improvements
[ ] the team can then review what improvements can be done in a reasonable time
[ ] implement!

Feb 25 '22 05:02 anusha94

This would be an interesting investigation! 😄

Feb 25 '22 06:02 karuppiah7890

Personally I have seen more speed when using more CPU and RAM - this is based on experience from TCE when running E2E tests for Docker clusters (CAPD). For other clusters like AWS, Azure, VMC(vSphere), the speed was dependent on the kind of machines we were spinning up on AWS, Azure, VMC, and required only little resources (CPU and RAM) from the host machine running the tanzu CLI where the kind bootstrap cluster had to be run on top of Docker

Also, one tricky thing is, usually Docker runtime has all resources of the host machine but it need not be the case. So that's something to check out too. In dev machines, many devs would allocated only a part of their host resources to Docker runtime. In CI/CD environments with container support, like GitHub Actions, the VMs provide full resources to the Docker runtime usually but it's worthwhile to confirm it during investigation than assuming

Feb 25 '22 06:02 karuppiah7890

It takes time to building host agent separately for every container. If we move it to suit-test.go to run it only once for entire e2e. It can save some time. My test is as followed:

Before do this:

e2e_test: time elapse: 4m38.405165171s
md_scale_test: time elapse: 11m2.787283875s
byohost_reuse_test: time elapse: 7m14.318728577s

After do this:

e2e_test: time elapse: 5m27.469585284s
md_scale_test: time elapse: 7m26.022867913s
byohost_reuse_test: time elapse: 6m51.386107789s

It can save more time for md_scale_test.

This is done by https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/pull/404

Feb 28 '22 09:02 huchen2021

I added some time check point, and got some data. The total test costs 21m25.592598031s, it included e2e_test 4m31.395224658s and reuse_test 5m43.749294255s. Not sure about the exact value of md_scale_test costs, because it report errors when stop container at the very end of test. It should more than 7m. The detail data is as followed:

setupBootstrapCluster: 1m8.094087766s initBootstrapCluster: 1m4.417632025s

e2e_test: Total: 4m31.395224658s

BeforeEach: time elapse: 45.860970377s
Creating byohost capacity pool: time elapse: 3.294036855s
creating a workload cluster: time elapse: 3m52.242673924s
dumpSpecResourcesAndCleanup: time elapse: 10.610023703s
clean up byoh container and files: time elapse: 21.341578337s

reuse_test: Total: 5m43.749294255s

BeforeEach: 40.732290086s
Creating byohost capacity pool: 2.968998621s
Creating a cluster: 2m51.500562216s
Delete the cluster and freeing the ByoHosts: 10.060083121s
Creating a new cluster: 2m0.959306317s
dumpSpecResourcesAndCleanup: 10.426351336s
clean up byoh container and files: 21.062230394s

md_scale_test: Total: (Didn’t get the value, because it report errors when stop container)

BeforeEach: 40.594427525s
Creating byohost capacity pool: 7.89641687s
Creating a workload cluster: 6m1.652912815s
Scaling the MachineDeployment out to 3: 50.432745806s
Scaling the MachineDeployment out to 3:10.14809982s
dumpSpecResourcesAndCleanup: 10.529640877s

Mar 07 '22 06:03 huchen2021

cluster-api-provider-bringyourownhost cluster-api-provider-bringyourownhost copied to clipboard

Investigate how fast can we get all the e2e tests to run

cluster-api-provider-bringyourownhost
cluster-api-provider-bringyourownhost copied to clipboard