Lua script freeze nginx worker
Hello! I have a trouble with reload nginx-ingress. I have a theory that k8s has so many ingress-crd inside, that lua blocking nginx worker and clients see 502 error. I tried to re-create this scenario and it really possible to make this on test stand
pid /tmp/nginx/nginx.pid;
daemon off;
worker_processes 2;
worker_shutdown_timeout 240s ;
events {
multi_accept on;
worker_connections 16384;
use epoll;
}
http {
lua_shared_dict dummy 1m;
server {
listen 8080;
location / {
return 200 "OK\n";
}
}
init_worker_by_lua_block {
ngx.log(ngx.ERR, "Heavy math start")
local acc = 0
for i = 1, 1e8 do
acc = acc + math.sqrt(i) * math.sin(i) / (i + 1)
end
ngx.log(ngx.ERR, "Heavy math done")
}
}
docker run --rm -it -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro -p 80:8080 registry.k8s.io/ingress-nginx/controller:v1.13.0 nginx
And you will see that depends on power of your machine nginx becomes ready after 10-20seconds.
Same situations happens when you will make nginx -s reload, master proccess started to route traffic to new workers BEFORE they becomes ready for it.
May be there is any possible solution to force nginx master process wait for complete initialization init_worker_by_lua_block { } and only after that started to route traffic on the new workers?
I’m quite interested in this topic, but according to Nginx’s design philosophy, a high-performance, I/O-intensive application should not perform any CPU-intensive tasks. A better approach might be to have a separate process or thread handle the computation of routing rules, and then have the Nginx worker’s Lua code simply load and apply the results.
Also, if we want to ensure that a new Nginx worker is not routed any new traffic before it finishes the init_worker phase, my initial thought is that this would require changes to the upstream Nginx core. It’s very likely such changes wouldn’t be accepted upstream, so in the end it could only be maintained as a custom patch.
It seems this topic has been discussed many times before, though I haven’t paid much attention to it until now. Others more familiar with this can feel free to add or correct my understanding.
should not perform any CPU-intensive tasks
For sure you are right. And my config - just an example. I haven't very big load, but it looks like that traffic is enough for get 502 on reload. Let me present some numbers. I have around 500 ingresses in my k8s cluster and few ingress-nginx nodes. So every ingress-node has from 7k - > 10k rps and every reload I see around 300 requests ended with 502 status. It's equal nginx freeze around 40ms... quite small, but enough for 502
Basic config for ingress you can find here may be you have any idea how can I make it better?
And one more thing. I still not sure for 100% that LUA is my main problem... I tried to debug via "printf()" (ngx.log(ngx.ERR, "start/stop")) on production host and it's really hard because without traffic it's always the same timestamp, even in my example from topic (hello cache). Under production traffic delay between start and stop log in lua block - always around 1.5s
If the init_worker phase in your scenario only takes a short time, then the cause of the 502 errors doesn’t appear to be due to init_worker execution. A 502 error means the request has already successfully established a TCP connection with Nginx, and the timeout occurred when Nginx was trying to connect to the upstream server.
If the init_worker execution had blocked for too long causing Nginx to freeze, the client would more likely experience a TCP connection timeout instead. You may need to analyze more specifically why the 502 error requests failed to access the upstream server.
Maybe. But I can't reproduce this behavior with vanilla nginx (without lua). I already had a similar experience with nginx + njs module... there was something similar... the same freeze for a few milliseconds, the same 502 errors...
I think the best way to globally fix it is to migrate my k8s cluster from Ingress (nginx+lua) to Gateway (based on envoy-proxy). But maybe someone else has the same problems and can't migrate from nginx+lua.