lua-resty-lock
lua-resty-lock copied to clipboard
memory leak under high GC pressure
When using lua-resty-lock to create and dispose large amount of locks will cause
the internal memo
table behave incorrectly.
We injected some debugging code in lock.lua
and output the "leaked" objects in the memo table:
--- /usr/local/openresty/lualib/resty/lock.lua 2022-07-27 21:12:38.000000000 -0700
+++ resty_lock.lua 2022-07-05 04:22:57.000000000 -0700
@@ -25,6 +25,37 @@
local memo = {}
if debug then _M.memo = memo end
+local l = ngx.now()
+local nkeys = require "table.nkeys"
+local function dd()
+ if ngx.now() - l <= 1 then
+ return
+ end
+ l = ngx.now()
+ local total = nkeys(memo)
+ local used = 0
+ for _, k in pairs(memo) do
+ if type(k) ~= "number" then
+ used = used+ 1
+ end
+ end
+
+ local unused = 0
+ local ii = FREE_LIST_REF
+ while unused < total do
+ local p = memo[ii]
+ if type(p) == "number" then
+ unused = unused + 1
+ else
+ break
+ end
+ ii = p
+ end
+
+
+ ngx.log(ngx.ERR, " --------- keys: ", total, " used: ", used, " unused: ", unused, " leak: ", total - used - unused)
+end
+
local function ref_obj(key)
if key == nil then
@@ -36,8 +67,17 @@
else
ref = #memo + 1
+
+ if ref-1 < nkeys(memo) then
+ ngx.log(ngx.ERR, "########### table tail ", ref-1, " is smaller than nkeys(memo) ", nkeys(memo), "??????")
+ end
end
+ if type(key) == "number" then
+ ngx.log(ngx.ERR, "########### got a number ", key, " ", debug.traceback())
+ end
+
memo[ref] = key
+ dd()
-- print("ref key_id returned ", ref)
return ref
@@ -219,3 +259,4 @@
return _M
+
create a test script:
local lock = require "resty_lock"
local t = 0
while true do
local locks = {}
t = t + 1
for i=1, 5000 do
locks[i] = lock:new("locks")
locks[i]:lock(t .. "a" .. i)
locks[i]:unlock(t .. "a" .. i)
end
ngx.sleep(0.1)
end
and a file to inject shm directive:
$ cat lock_include.conf
lua_shared_dict locks 8m;
Then run with resty --http-include lock_include.conf test.lua
, you will observe leak:
increases, sometimes very slow, sometime very fast; in one cases i'm able to observe 10k keys leaked per second:
$ resty --http-include lock_include.conf lock4.lua
2022/07/27 22:20:19 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd(): --------- keys: 22301 used: 10001 unused: 0 leak: 12300, context: ngx.timer
2022/07/27 22:20:20 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd(): --------- keys: 36093 used: 10001 unused: 2313 leak: 23779, context: ngx.timer
2022/07/27 22:20:21 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd(): --------- keys: 46141 used: 10001 unused: 0 leak: 36140, context: ngx.timer
(For convience I attached the codes used above in an archive lock_leak.tgz)
In production environment we also see this leak happens continously until it hits LuaJIT table limit and throws table overflow
error.
We also investigated the un-released keys in memo
and found all of them are the refs to the shdict object created at https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L90. It seems that when ref_obj
is called (https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L33-L40) and memory allocation may happen on new key, and LuaJIT may jump into GC and thus calls unref_obj
, then makes the link in the memo table broken.
We have came up with two approaches:
- to use the undocumented
newproxy
API to replace ref/unref (https://github.com/Kong/lua-resty-lock/pull/1), we observe 20% performance drop on the lock/unlock API calls - to reduce the number of refs to shdict object (https://github.com/Kong/lua-resty-lock/pull/2), this has no performance penalty, but may not fix the root cause of problem
Thanks for your detailed analysis!
Personally, I will vote for solution one.
As for solution two, it reduces the number of unref_obj
calls with dict_id
, but doesn't reduce the number of unref_obj
with key_id
. So the problem still exists but less frequently. Or may take more time to reach the table overflow
error.