cluster-api-provider-bringyourownhost icon indicating copy to clipboard operation
cluster-api-provider-bringyourownhost copied to clipboard

Investigate how fast can we get all the e2e tests to run

Open jamiemonserrate opened this issue 3 years ago • 6 comments

Describe the solution you'd like At the moment, running all the e2e tests takes 30ish mins. Can we investigate what is the bottleneck, and if it is possible to speed this up?

jamiemonserrate avatar Nov 26 '21 05:11 jamiemonserrate

This is where all our e2e tests live - https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/tree/main/test/e2e

This issue involves doing some research on what steps in the tests are consuming a lot of time and if there are ways to mitigate this.

anusha94 avatar Feb 04 '22 04:02 anusha94

Good to start with the quickstart tests - https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/blob/main/test/e2e/e2e_test.go

  • [ ] identify how much time each step takes (test setup, creating management cluster, docker hosts, apply cluster template, log collection, teardown)
  • [ ] identify possible improvements
  • [ ] the team can then review what improvements can be done in a reasonable time
  • [ ] implement!

anusha94 avatar Feb 25 '22 05:02 anusha94

This would be an interesting investigation! 😄

karuppiah7890 avatar Feb 25 '22 06:02 karuppiah7890

Personally I have seen more speed when using more CPU and RAM - this is based on experience from TCE when running E2E tests for Docker clusters (CAPD). For other clusters like AWS, Azure, VMC(vSphere), the speed was dependent on the kind of machines we were spinning up on AWS, Azure, VMC, and required only little resources (CPU and RAM) from the host machine running the tanzu CLI where the kind bootstrap cluster had to be run on top of Docker

Also, one tricky thing is, usually Docker runtime has all resources of the host machine but it need not be the case. So that's something to check out too. In dev machines, many devs would allocated only a part of their host resources to Docker runtime. In CI/CD environments with container support, like GitHub Actions, the VMs provide full resources to the Docker runtime usually but it's worthwhile to confirm it during investigation than assuming

karuppiah7890 avatar Feb 25 '22 06:02 karuppiah7890

It takes time to building host agent separately for every container. If we move it to suit-test.go to run it only once for entire e2e. It can save some time. My test is as followed:

Before do this:

  • e2e_test: time elapse: 4m38.405165171s
  • md_scale_test: time elapse: 11m2.787283875s
  • byohost_reuse_test: time elapse: 7m14.318728577s

After do this:

  • e2e_test: time elapse: 5m27.469585284s
  • md_scale_test: time elapse: 7m26.022867913s
  • byohost_reuse_test: time elapse: 6m51.386107789s

It can save more time for md_scale_test.

This is done by https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/pull/404

huchen2021 avatar Feb 28 '22 09:02 huchen2021

I added some time check point, and got some data. The total test costs 21m25.592598031s, it included e2e_test 4m31.395224658s and reuse_test 5m43.749294255s. Not sure about the exact value of md_scale_test costs, because it report errors when stop container at the very end of test. It should more than 7m. The detail data is as followed:

setupBootstrapCluster: 1m8.094087766s initBootstrapCluster: 1m4.417632025s

e2e_test: Total: 4m31.395224658s

  • BeforeEach: time elapse: 45.860970377s
  • Creating byohost capacity pool: time elapse: 3.294036855s
  • creating a workload cluster: time elapse: 3m52.242673924s
  • dumpSpecResourcesAndCleanup: time elapse: 10.610023703s
  • clean up byoh container and files: time elapse: 21.341578337s

reuse_test: Total: 5m43.749294255s

  • BeforeEach: 40.732290086s
  • Creating byohost capacity pool: 2.968998621s
  • Creating a cluster: 2m51.500562216s
  • Delete the cluster and freeing the ByoHosts: 10.060083121s
  • Creating a new cluster: 2m0.959306317s
  • dumpSpecResourcesAndCleanup: 10.426351336s
  • clean up byoh container and files: 21.062230394s

md_scale_test: Total: (Didn’t get the value, because it report errors when stop container)

  • BeforeEach: 40.594427525s
  • Creating byohost capacity pool: 7.89641687s
  • Creating a workload cluster: 6m1.652912815s
  • Scaling the MachineDeployment out to 3: 50.432745806s
  • Scaling the MachineDeployment out to 3:10.14809982s
  • dumpSpecResourcesAndCleanup: 10.529640877s

huchen2021 avatar Mar 07 '22 06:03 huchen2021