lua-resty-redis
lua-resty-redis copied to clipboard
Redis: timeout
Whether the redis client will block the nginx event loop in implementation, or the read timeout is caused by lua's own coroutine scheduling, because I found that the redis server responds quickly, and the redis client will appear in the timeout caused by not being able to read the response within 50ms
the error log:
2024/08/06 15:49:01 [debug] 384947#384947: *171134 [lua] base_plugin.lua:24: access(): executing plugin "rate-limiting": access 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:60: new(): ==========new==========, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket pool get keepalive peer 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket get keepalive peer: using connection 00007FBE83FEB1F8, fd:79 2024/08/06 15:49:01 [debug] 384947#384947: *171134 event timer del: 79: 3647014359 2024/08/06 15:49:01 [debug] 384947#384947: *171134 http cleanup add: 0000000001424E68 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:220: _gen_req(): ======nargs======5, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:231: _gen_req(): ======_gen_req=====eval=====i=====1, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:231: _gen_req(): ======_gen_req===== local cache_key, expiration = KEYS[1], ARGV[1] 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:231: _gen_req(): ======_gen_req=====1=====i=====3, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:231: _gen_req(): ======_gen_req=====ratelimit:d7af6063-037c-47d0-934a-1f19c412ea94:d7af6063-037c-47d0-934a-1f19c412ea94:1722930541000:second:0aa0512166b1d56d3617dfb30304e602=====i=====4, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:231: _gen_req(): ======_gen_req=====1=====i=====5, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket send timeout: 50 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua allocate new chainlink and new buf of size 431, cl:0000000001424EB0 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket send data 2024/08/06 15:49:01 [debug] 384947#384947: *171134 send: fd:79 431 of 431 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket sent all the data 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:268: eval(): =====req======26, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:279: eval(): ===req===1:*5 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:129: eval(): before_receive_time timestamp: 1722930541450, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket calling receive() method 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket read timeout: 50 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua allocate new chainlink and new buf of size 4096, cl:00000000014250C0 2024/08/06 15:49:01 [debug] 384947#384947: *171134 malloc: 0000000001425200:4096 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket read timeout: 50 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket read data: wait:0 2024/08/06 15:49:01 [debug] 384947#384947: *171134 event timer add: 79: 50:3647004499 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua resume returned 1 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua thread yielded 2024/08/06 15:49:01 [debug] 384947#384947: *171134 http run request: "/test_kylin_upstream_timeout?" 2024/08/06 15:49:01 [debug] 384947#384947: *171134 access phase: 14 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua access handler, uri:"/test_kylin_upstream_timeout" c:1 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua run write event handler: timedout:0, ready:1, writing_raw_req_socket:0 2024/08/06 15:49:01 [debug] 384947#384947: *171134 useless lua write event handler 2024/08/06 15:49:01 [debug] 384947#384947: *171134 event timer del: 79: 3647004499 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket handler for "/test_kylin_upstream_timeout?", wev 0 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket read handler 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket handle read error 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket waking up the current request 2024/08/06 15:49:01 [debug] 384947#384947: *171134 access phase: 14 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua access handler, uri:"/test_kylin_upstream_timeout" c:1 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp operation done, resuming lua thread 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket calling prepare retvals handler 0000000000663B16, u:00007FBE8318A408 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua tcp socket receive return value handler 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua run thread, top:0 c:1 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:133: eval(): =========receive time===== 53ms, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [error] 384947#384947: *171134 [lua] redis.lua:134: eval(): =======line=======nil=====err=====timeout=====partial=====, client: 127.0.0.1, server: kylin, request: "GET /test_kylin_upstream_timeout HTTP/1.1", host: "127.0.0.1:8000" 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua finalize socket 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua http cleanup free: 0000000001424E68 2024/08/06 15:49:01 [debug] 384947#384947: *171134 lua close socket connection
Just give us a minimal example that could reproduce this problem. That would help us understand your problem.
You can also use OpenResty XRay to see if there are any off-CPU events.
The gateway kong uses Redis to limit the service flow, and the gateway and redis services are closed in the same computer room. The timeout time set by the gateway as a client is 50ms
The problem now is that the qps of the gateway request to redis is not high, but the redis client on the gateway side will report many timeouts. tcpdump captures the packet and analyzes it, and it is found that the redis server responds to the data within 5ms. timeout was returned after receiving more than 50ms
Machine specification 4C8G, The container environment only has test traffic, without any other requests, and uses the wrk tool to send requests in parallel. Because with service throttling enabled, each request will trigger nginx to interact with the redis server, look up the value of a key in redis, and incrby will increase the value of the key, as shown in the source code: https://github.com/Kong/kong/blob/master/kong/plugins/rate-limiting/policies/init.lua
The cpu idle rate is more than 85%
The captured data is as follows. The redis server responded within 1.5ms
@zhuizhuhaomeng Hello, do you have any ideas on this problem? Please give me some guidance. Thank you
@liukaiyuliukaiyu
Did you run the tcpdump on the gateway or in the redis services server?
You may also use perf sched record to check the max latency of the process.
I'm having the same problem that @liukaiyuliukaiyu mentioned. I created a custom plugin in Kong with this library and have several timeouts connecting and doing authentication through Redis. My setup is with Kong Db-less in and Redis-Cluster, both in Kubernetes on the same nodes (sometimes).
A few errors /debug messages from my setup: [proxy-cache-redis] REDIS: redis-check failed to authenticate: timeout, fetch_slots(): failed to fetch slots: failed to fetch cluster info: timeout, redis.lua:200 failed to commit the cache value to Redis: timeout
Already tried to increase the connection timeout to bigger values, and the problem remains. Already did the TCPDump and I see exactly the same of @liukaiyuliukaiyu . Already check file descriptores and they are limited to the max defined by nginx and open resty lua 16384
Any idea or guidance to try to find this issue?
My redis connection settings
"redis_hosts": "api-redis.trf-public.svc.cluster.local", "redis_password": xxx, "redis_port": "6379", "redis_timeout": "150", "redis_database": "0", "redis_ssl": "false", "redis_ssl_verify": "false", "pool_size": "256", "backlog": "16384", "max_idle_timeout": "5000", "redis_headless": "api-redis-headless.trf-public.svc.cluster.local"
Any idea or guidance to try to find this issue?
https://github.com/Kong/kong/issues/13473 With this in mind, our conclusion so far is that the nginx user-mode event loop takes a long time
@liukaiyuliukaiyu did you already apply any mitigation or workaround for this issue? Im using Azure Fsv16 machines and using only 8 workers for kong, that way I guarantee a set of resources for the k8s system and not exhausting the machines. (per pod)
@liukaiyuliukaiyu did you already apply any mitigation or workaround for this issue? Im using Azure Fsv16 machines and using only 8 workers for kong, that way I guarantee a set of resources for the k8s system and not exhausting the machines. (per pod)
We plan to upgrade the machine specifications or traffic splitting to solve this problem, but it has not been implemented yet. If you have any new progress and solutions, please leave a message and communicate in time, thank you!