关于ngx.thread.spawn+ngx.location.capture+ngx.exit引发的内存泄露以及coredump问题
问题版本号
ngx_lua-0.10.13 最新的代码,应该也存在问题
问题复现用例
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/f1")-- 子请求f1(0.1s 返回结果)
local t2 = ngx.thread.spawn(fetch, "/f2")-- 子请求f2(0.2s 返回结果)
ngx.thread.wait(t1, t2)
ngx.say("example2")
ngx.exit(200)
出现异常时的堆栈(运行行为未定义,可能是segment fault,也可能是死循环)
lj_gc_step会进入死循环
- gc_onestep 返回0
- lim-=0一直大于0
- g->gc.state不等于GCSPAUSE
分析
- light thread t1 返回后,entry thread 调用了ngx.exit(),随后在ngx_http_lua_handle_exit函数中释放请求资源(由于r->main->count不等于0,所以请求资源是泄露了的),并解除了对light thread t2以及entry thread的引用(意味着会被GC回收)。
- light thread t2 返回后,如果lua_state被回收,运行行为未定义
复现用例协程调度图
如何修复
思路:防范于未然,调用ngx.exit时如果有未结束的capture,直接抛异常。 其实ngx.exit内部有检查是否能结束请求的逻辑
if (ctx->no_abort
&& rc != NGX_ERROR
&& rc != NGX_HTTP_CLOSE
&& rc != NGX_HTTP_REQUEST_TIME_OUT
&& rc != NGX_HTTP_CLIENT_CLOSED_REQUEST)
{
return luaL_error(L, "attempt to abort with pending subrequests");
}
这个检查在只有一个capture子请求时是生效的。如果如用例所示有多个capture时,就会失效。 解决办法是将no_abort的语义由bool类型改为对capture的计数,创建时+1,结束时-1,具体代码如下所示:
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
index 01ef2be..f8bfb2e 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
@@ -500,6 +500,9 @@ typedef struct ngx_http_lua_ctx_s {
int uthreads; /* number of active user threads */
+ int no_aborts; /* prohibit "world abortion" via ngx.exit()
+ and etc */
+
uint16_t context; /* the current running directive context
(or running phase) for the current
Lua chunk */
@@ -538,9 +541,6 @@ typedef struct ngx_http_lua_ctx_s {
unsigned buffering:1; /* HTTP 1.0 response body buffering flag */
- unsigned no_abort:1; /* prohibit "world abortion" via ngx.exit()
- and etc */
-
unsigned header_sent:1; /* r->header_sent is not sufficient for
* this because special header filters
* like ngx_image_filter may intercept
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
index 6ac2cbf..03883e1 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
@@ -354,7 +354,7 @@ ngx_http_lua_ngx_exit(lua_State *L)
#endif
}
- if (ctx->no_abort
+ if (ctx->no_aborts
&& rc != NGX_ERROR
&& rc != NGX_HTTP_CLOSE
&& rc != NGX_HTTP_REQUEST_TIME_OUT
@@ -508,7 +508,7 @@ ngx_http_lua_ffi_exit(ngx_http_request_t *r, int status, u_char *err,
#endif
}
- if (ctx->no_abort
+ if (ctx->no_aborts
&& status != NGX_ERROR
&& status != NGX_HTTP_CLOSE
&& status != NGX_HTTP_REQUEST_TIME_OUT
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
index 826a43c..f7d8de0 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
@@ -616,7 +616,7 @@ ngx_http_lua_ngx_location_capture_multi(lua_State *L)
ngx_array_destroy(extra_vars);
}
- ctx->no_abort = 1;
+ ctx->no_aborts++;
return lua_yield(L, 0);
}
@@ -987,7 +987,7 @@ ngx_http_lua_post_subrequest(ngx_http_request_t *r, void *data, ngx_int_t rc)
if (pr_coctx->pending_subreqs == 0) {
dd("all subrequests are done");
- pr_ctx->no_abort = 0;
+ pr_ctx->no_aborts--;
pr_ctx->resume_handler = ngx_http_lua_subrequest_resume;
pr_ctx->cur_co_ctx = pr_coctx;
}
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
index f7a537e..8d8d0f8 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
@@ -1411,8 +1411,8 @@ user_co_done:
dd("headers sent? %d", r->header_sent || ctx->header_sent);
- if (ctx->no_abort) {
- ctx->no_abort = 0;
+ if (ctx->no_aborts) {
+ ctx->no_aborts--;
return NGX_ERROR;
}
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
index 7dcc6f7..c5f2d20 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
@@ -245,7 +245,7 @@ void ngx_http_lua_cleanup_free(ngx_http_request_t *r,
#define ngx_http_lua_check_if_abortable(L, ctx) \
- if ((ctx)->no_abort) { \
+ if ((ctx)->no_aborts) { \
return luaL_error(L, "attempt to abort with pending subrequests"); \
}
能否在 最新的 ngx_lua 版本上进行验证?
能否在 最新的 ngx_lua 版本上进行验证?
可以,我抽空在最新的版本上也尝试复现下
能否在 最新的 ngx_lua 版本上进行验证? @zhuizhuhaomeng 大佬有空帮看看,谢谢
结论
在最新的1.27.1.1能够复现
编译环境
Linux 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
编译命令
nginx version: openresty/1.27.1.1 built by gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) built with OpenSSL 3.4.1 11 Feb 2025 TLS SNI support enabled configure arguments: --prefix=/home/yyq/workspace/nginx/openresty-1.27.1.1-bin/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.27 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../rds-json-nginx-module-0.17 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.15 --with-ld-opt=-Wl,-rpath,/home/yyq/workspace/nginx/openresty-1.27.1.1-bin/luajit/lib --with-zlib=/home/yyq/workspace/nginx/zlib-1.2.13 --with-pcre=/home/yyq/workspace/nginx/pcre-8.45 --with-openssl=/home/yyq/workspace/nginx/openssl-3.4.1 --with-openssl-opt=-g --with-pcre-opt=-g --with-zlib-opt=-g --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module
复现过程
使用fortio工具以 100 QPS 请求http://localhost:8090/example0 或者 http://localhost:8090/example2
异常时调用栈
复现时的nginx.conf
#user nobody;
worker_processes 1;
daemon off;
#error_log logs/error.log;
#error_log logs/error.log notice;
error_log logs/error.log info;
#pid logs/nginx.pid;
events {
worker_connections 10240;
}
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
server {
listen 8090;
location /internal/ {
internal;
rewrite ^/internal/(.*)$ /$1 break;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_pass http://127.0.0.1:9090;
}
# good
location /example {
content_by_lua_block {
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
ngx.say("example")
}
}
location /stopgc {
content_by_lua_block {
collectgarbage("stop")
ngx.say("stopgc")
}
}
# bad
location /example0 {
content_by_lua_block {
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
ngx.thread.wait(t1)
ngx.say("example0")
ngx.exit(200)
}
}
# good
location /example1 {
content_by_lua_block {
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
ngx.thread.wait(t1)
ngx.thread.wait(t2)
ngx.say("example1")
ngx.exit(200)
}
}
# bad
location /example2 {
content_by_lua_block {
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
ngx.thread.wait(t1, t2)
ngx.say("example2")
ngx.exit(200)
}
}
# good
location /example3 {
content_by_lua_block {
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
ngx.thread.wait(t1, t2)
ngx.say("example3")
}
}
# good
location /example4 {
content_by_lua_block {
local fetch = function(uri)
return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
ngx.thread.wait(t1)
ngx.thread.wait(t2)
ngx.say("example4")
ngx.exit(200)
}
}
# good
location /example5 {
content_by_lua_block {
local func1 = function()
ngx.sleep(0.01)
ngx.say("t1: hello")
return "t1 done"
end
local func2 = function()
ngx.sleep(0.2)
ngx.say("t2: hello")
return "t2 done"
end
local t1 = ngx.thread.spawn(func1)
local t2 = ngx.thread.spawn(func2)
local ok, res = ngx.thread.wait(t1, t2)
if ok then
ngx.say("status: ", res.status, ", body: ", res.body)
else
ngx.say("not ok")
end
ngx.exit(ngx.OK)
}
}
}
}