nginx-gateway-fabric icon indicating copy to clipboard operation
nginx-gateway-fabric copied to clipboard

Log Collection Errors during Longevity Test

Open mpstefan opened this issue 1 year ago • 1 comments
trafficstars

When running a longevity test:

In the NGF error log, log entries like the following were not collected during the 1.2.0 run:

ERROR 2024-03-20T14:13:00.374159128Z [resource.labels.containerName: nginx-gateway] {"error":"leader election lost", "level":"error", "msg":"error received after stop sequence was engaged", "stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1 sigs.k8s.io/[email protected]/pkg/manager/internal.go:490", "ts":"2024-03-20T14:13:00Z"}

In the NGF log, log entries like the following were not collected during the 1.2.0 run:

ERROR 2024-03-17T21:11:11.017601264Z [resource.labels.containerName: nginx] 2024/03/17 21:11:10 [error] 43#43: *211045372 no live upstreams while connecting to upstream, client: 10.128.0.19, server: cafe.example.com, request: "GET /tea HTTP/1.1", upstream: "http://longevity_tea_80/tea", host: "cafe.example.com"

mpstefan avatar Mar 21 '24 14:03 mpstefan

Some ways we can simulate the above errors to test:

  1. We can change RBAC to disallow NGF to access the leader election resource.
  2. Create http route for an application which does not accept traffic on port 80.

OR Have NGF write the above examples in logs

OR Insert the example logs into the logging platform

mpstefan avatar Jun 24 '24 15:06 mpstefan