nginx-gateway-fabric
nginx-gateway-fabric copied to clipboard
Automated upgrade test fails
Describe the bug
The upgrade_test.go automated NFR fails.
To Reproduce
- Run automated upgrade test.
Expected behavior Test should match 1.2 results.
Your environment
- Version of the NGINX Gateway Fabric - edge
- Version of Kubernetes - 1.28.7-gke.1026000
- Kubernetes platform (e.g. Mini-kube or GCP) - GKE
- Details on how you expose the NGINX Gateway Fabric Pod (e.g. Service of type LoadBalancer or port-forward) - LoadBalancer
Additional context
[FAILED] [159.897 seconds]
Upgrade testing [It] upgrades NGF with zero downtime [nfr, upgrade]
/home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:83
[FAILED] Expected success, but got an error:
<*fmt.wrapError | 0xc00102f360>:
client rate limiter Wait returned an error: context deadline exceeded
{
msg: "client rate limiter Wait returned an error: context deadline exceeded",
err: <context.deadlineExceededError>{},
}
In [It] at: /home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:210 @ 04/29/24 20:15:02.521
Full Stack Trace
github.com/nginxinc/nginx-gateway-fabric/tests/suite.init.func8.3.2({0x1dcd6500?, 0xc00006ea80?})
/home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:210 +0xb6
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2(0xc0004c7a88?, {0x1e61310?, 0xc00104ea80?})
/home/username/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:87 +0x52
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x1e61310, 0xc00104ea80}, {0x1e56418, 0xc00102f0a0}, 0x1, 0x0, 0xc0004c7e90)
/home/username/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:88 +0x24d
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x1e61310, 0xc00104ea80}, 0xdf8475800?, 0x1, 0xc0009e3e90)
/home/username/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33 +0x56
github.com/nginxinc/nginx-gateway-fabric/tests/suite.init.func8.3()
/home/username/nginx-gateway-fabric/tests/suite/upgrade_test.go:205 +0x97e
This chunk of code in upgrade_test.go:
var lease coordination.Lease
key := types.NamespacedName{Name: "ngf-test-nginx-gateway-fabric-leader-election", Namespace: ngfNamespace}
Expect(wait.PollUntilContextCancel(
leaseCtx,
500*time.Millisecond,
true, /* poll immediately */
func(_ context.Context) (bool, error) {
Expect(k8sClient.Get(leaseCtx, key, &lease)).To(Succeed())
if lease.Spec.HolderIdentity != nil {
for _, podName := range podNames {
if podName == *lease.Spec.HolderIdentity {
return true, nil
}
}
}
return false, nil
},
)).To(Succeed())
Will fail due to the *lease.Spec.HolderIdentity always containing a hash after the podName. e.g. my-release-nginx-gateway-fabric-fd4bc4cb6-dx7dp_9e75f1dc-3ed4-4ff5-969a-0c98940a7721
May also want to run other automated tests to see if they are still functioning correctly.
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.
@bjee19 Is this still an issue? If so, we should prioritize this so it doesn't fail when we do release testing.
@sjberman yep, just re-ran the test on main and got the same error.
Re-ran the rest of the automated tests with no issues found. Reran longevity test also with time set to 5m just to ensure the automation around was still working correctly. I didn't do any analysis, my only focus was to ensure the automation was working correctly.