nginx-gateway-fabric icon indicating copy to clipboard operation
nginx-gateway-fabric copied to clipboard

Automated upgrade test fails

Open bjee19 opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug The upgrade_test.go automated NFR fails.

To Reproduce

  1. Run automated upgrade test.

Expected behavior Test should match 1.2 results.

Your environment

  • Version of the NGINX Gateway Fabric - edge
  • Version of Kubernetes - 1.28.7-gke.1026000
  • Kubernetes platform (e.g. Mini-kube or GCP) - GKE
  • Details on how you expose the NGINX Gateway Fabric Pod (e.g. Service of type LoadBalancer or port-forward) - LoadBalancer

Additional context

[FAILED] [159.897 seconds]
Upgrade testing [It] upgrades NGF with zero downtime [nfr, upgrade]
/home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:83

  [FAILED] Expected success, but got an error:
      <*fmt.wrapError | 0xc00102f360>: 
      client rate limiter Wait returned an error: context deadline exceeded
      {
          msg: "client rate limiter Wait returned an error: context deadline exceeded",
          err: <context.deadlineExceededError>{},
      }
  In [It] at: /home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:210 @ 04/29/24 20:15:02.521

  Full Stack Trace
    github.com/nginxinc/nginx-gateway-fabric/tests/suite.init.func8.3.2({0x1dcd6500?, 0xc00006ea80?})
        /home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:210 +0xb6
    k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2(0xc0004c7a88?, {0x1e61310?, 0xc00104ea80?})
        /home/username/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:87 +0x52
    k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x1e61310, 0xc00104ea80}, {0x1e56418, 0xc00102f0a0}, 0x1, 0x0, 0xc0004c7e90)
        /home/username/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:88 +0x24d
    k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x1e61310, 0xc00104ea80}, 0xdf8475800?, 0x1, 0xc0009e3e90)
        /home/username/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33 +0x56
    github.com/nginxinc/nginx-gateway-fabric/tests/suite.init.func8.3()
        /home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:205 +0x97e


This chunk of code in upgrade_test.go:

var lease coordination.Lease
		key := types.NamespacedName{Name: "ngf-test-nginx-gateway-fabric-leader-election", Namespace: ngfNamespace}
		Expect(wait.PollUntilContextCancel(
			leaseCtx,
			500*time.Millisecond,
			true, /* poll immediately */
			func(_ context.Context) (bool, error) {
				Expect(k8sClient.Get(leaseCtx, key, &lease)).To(Succeed())

				if lease.Spec.HolderIdentity != nil {
					for _, podName := range podNames {
						if podName == *lease.Spec.HolderIdentity {
							return true, nil
						}
					}
				}

				return false, nil
			},
		)).To(Succeed())

Will fail due to the *lease.Spec.HolderIdentity always containing a hash after the podName. e.g. my-release-nginx-gateway-fabric-fd4bc4cb6-dx7dp_9e75f1dc-3ed4-4ff5-969a-0c98940a7721

bjee19 avatar Apr 26 '24 23:04 bjee19

May also want to run other automated tests to see if they are still functioning correctly.

bjee19 avatar Apr 26 '24 23:04 bjee19

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar May 14 '24 02:05 github-actions[bot]

@bjee19 Is this still an issue? If so, we should prioritize this so it doesn't fail when we do release testing.

sjberman avatar May 14 '24 14:05 sjberman

@sjberman yep, just re-ran the test on main and got the same error.

bjee19 avatar May 14 '24 15:05 bjee19

Re-ran the rest of the automated tests with no issues found. Reran longevity test also with time set to 5m just to ensure the automation around was still working correctly. I didn't do any analysis, my only focus was to ensure the automation was working correctly.

ciarams87 avatar May 22 '24 10:05 ciarams87