nginx-gateway-fabric icon indicating copy to clipboard operation
nginx-gateway-fabric copied to clipboard

Investigate socket errors reported by wrk during tests

Open pleshakov opened this issue 2 years ago • 5 comments
trafficstars

Describe the bug

Longevity tests https://github.com/nginxinc/nginx-gateway-fabric/pull/1113 reported a large number of read socket errors from wrk -- see https://github.com/nginxinc/nginx-gateway-fabric/blob/6e1c9d82030cfa1f5c2453c60cb26d087b43c4e5/tests/longevity/results/1.0.0/1.0.0.md#traffic

Such errors can be reproduced in a short time period (30s) if NGINX is reloaded.

It is not clear what causes the issue and if it can lead to downtime as perceived by clients.

To Reproduce

  • Deploy NGF with one replica
  • Deploy manifests from here -- https://github.com/nginxinc/nginx-gateway-fabric/tree/main/tests/longevity/manifests -- cafe.yaml, cafe-secret.yaml, gateway.yaml, cafe-routes.yaml
  • Edit /etc/hosts on your machine so that cafe.example.com points to the NGF (pod IP is fine, no need to use an external LB)

HTTP check: Run :

wrk -t2 -c100 -d30s http://cafe.example.com/coffee

At the same time, run re-rollout coffee pods - that will cause a series of NGINX reloads:

kubectl rollout restart deployment/coffee

See the socket errors in the output of wrk. Example:

 Socket errors: connect 0, read 257, write 0, timeout 0

HTTPS check: Run :

wrk -t2 -c100 -d30s  https://cafe.example.com/tea

At the same time, run re-rollout tea pods - that will cause a series of NGINX reloads:

kubectl rollout restart deployment/tea

See the socket errors in the output of wrk. Example:

  Socket errors: connect 0, read 290, write 0, timeout 0

Expected behavior No errors

Your environment Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3-gke.100", GitCommit:"6466b51b762a5c49ae3fb6c2c7233ffe1c96e48c", GitTreeState:"clean", BuildDate:"2023-06-23T09:27:28Z", GoVersion:"go1.20.5 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

wrk -v
wrk debian/4.1.0-3 [epoll] Copyright (C) 2012 Will Glozer

NGF:

"version":"edge","commit":"5324908e6e1145bec5f2f0ab80b312a809ad1744","date":"2023-10-13T18:29:23Z"

the machine with wrk is on the same network as Kubernetes cluster with direct access to the pods

Additional Info

  • Same errors if you run with NIC, NGINX Ingress Controller Version=3.3.1 Commit=0f828bb5f4159d7fb52bcff0159d1ddd99f16f87 Date=2023-10-13T16:23:42Z DirtyState=false Arch=linux/amd64 Go=go1.21.3

pleshakov avatar Oct 17 '23 13:10 pleshakov