100% CPU in shared dict :get()
when using a shared dict where all nginx workers repeatedly :get() the same 1-2 keys with a high number of simultaenous requests, some nginx processes seem to get stuck in a deadlock caused by locking causing 100% CPU load even when the number of requests has subsided already (all still existing requests are idle keepalive).
stub_status
Active connections: 102 server accepts handled requests 39308 39308 339293 Reading: 0 Writing: 287 Waiting: 93
strace shows (however it takes up to a minute for this to show/add another line, which is a further indicator this is not in nginx but in userland lua)
futex(0x7f7ad4c37080, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)
https://mailman.nginx.org/pipermail/nginx/2017-September/054687.html reports a similar issue in nginx, however nginx doesn't natively use mutex and the issue could clearly be traced back to lua code
Checking the nginx processes with pstack
#0 0x0000000000438c86 in ngx_shmtx_lock (mtx=0x7f7ad4c37068) at src/core/ngx_shmtx.c:86 #1 0x0000000000527937 in ngx_http_lua_ffi_shdict_get (zone=0xbd5cc0, key=0x7f7ace894b60 "REDACTED-2", key_len=12, value_type=0x7f7ad4099928, str_value_buf=0x7f7ad409c340, str_value_len=0x7f7ad40b8850, num_value=0x7f7ad4093b40, user_flags=0x7f7ad4093b20, get_stale=0, is_stale=0x7f7ad409c300, err=0x7f7ad40bc318) at ../ngx_lua-0.10.26/src/ngx_http_lua_shdict.c:1593 #2 0x00007f7ad635dd19 in ?? () #3 0x00007f7ad4093b40 in ?? () #4 0x00007f7ad4093b20 in ?? () #5 0x0000000000000000 in ?? ()
https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L1568 (-> nginx https://github.com/nginx/nginx/blob/master/src/core/ngx_shmtx.c#L70C1-L70C15) shows that "get" creates a lock.
Is the lock for "get" really necessary? Is there a way to disable it? Any ideas what could caus this? Is it possibly not related to the :get() at all?
Possibly similar to https://github.com/openresty/lua-nginx-module/issues/1207 ?