lua-nginx-module icon indicating copy to clipboard operation
lua-nginx-module copied to clipboard

关于ngx.thread.spawn+ngx.location.capture+ngx.exit引发的内存泄露以及coredump问题

Open yyqbuct opened this issue 11 months ago • 3 comments

问题版本号

ngx_lua-0.10.13 最新的代码,应该也存在问题

问题复现用例

local fetch = function(uri)
    return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/f1")-- 子请求f1(0.1s 返回结果)
local t2 = ngx.thread.spawn(fetch, "/f2")-- 子请求f2(0.2s 返回结果)
ngx.thread.wait(t1, t2)
ngx.say("example2")
ngx.exit(200)

出现异常时的堆栈(运行行为未定义,可能是segment fault,也可能是死循环)

lj_gc_step会进入死循环

  • gc_onestep 返回0
  • lim-=0一直大于0
  • g->gc.state不等于GCSPAUSE Image Image

分析

  • light thread t1 返回后,entry thread 调用了ngx.exit(),随后在ngx_http_lua_handle_exit函数中释放请求资源(由于r->main->count不等于0,所以请求资源是泄露了的),并解除了对light thread t2以及entry thread的引用(意味着会被GC回收)。
  • light thread t2 返回后,如果lua_state被回收,运行行为未定义

复现用例协程调度图

Image

如何修复

思路:防范于未然,调用ngx.exit时如果有未结束的capture,直接抛异常。 其实ngx.exit内部有检查是否能结束请求的逻辑

    if (ctx->no_abort
        && rc != NGX_ERROR
        && rc != NGX_HTTP_CLOSE
        && rc != NGX_HTTP_REQUEST_TIME_OUT
        && rc != NGX_HTTP_CLIENT_CLOSED_REQUEST)
    {
        return luaL_error(L, "attempt to abort with pending subrequests");
    }

这个检查在只有一个capture子请求时是生效的。如果如用例所示有多个capture时,就会失效。 解决办法是将no_abort的语义由bool类型改为对capture的计数,创建时+1,结束时-1,具体代码如下所示:

diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
index 01ef2be..f8bfb2e 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
@@ -500,6 +500,9 @@ typedef struct ngx_http_lua_ctx_s {
 
     int                      uthreads; /* number of active user threads */
 
+    int                     no_aborts; /* prohibit "world abortion" via ngx.exit()
+                                          and etc */
+
     uint16_t                 context;   /* the current running directive context
                                            (or running phase) for the current
                                            Lua chunk */
@@ -538,9 +541,6 @@ typedef struct ngx_http_lua_ctx_s {
 
     unsigned         buffering:1; /* HTTP 1.0 response body buffering flag */
 
-    unsigned         no_abort:1; /* prohibit "world abortion" via ngx.exit()
-                                    and etc */
-
     unsigned         header_sent:1; /* r->header_sent is not sufficient for
                                      * this because special header filters
                                      * like ngx_image_filter may intercept
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
index 6ac2cbf..03883e1 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
@@ -354,7 +354,7 @@ ngx_http_lua_ngx_exit(lua_State *L)
 #endif
     }
 
-    if (ctx->no_abort
+    if (ctx->no_aborts
         && rc != NGX_ERROR
         && rc != NGX_HTTP_CLOSE
         && rc != NGX_HTTP_REQUEST_TIME_OUT
@@ -508,7 +508,7 @@ ngx_http_lua_ffi_exit(ngx_http_request_t *r, int status, u_char *err,
 #endif

     }

 
-    if (ctx->no_abort
+    if (ctx->no_aborts
         && status != NGX_ERROR
         && status != NGX_HTTP_CLOSE
         && status != NGX_HTTP_REQUEST_TIME_OUT
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
index 826a43c..f7d8de0 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
@@ -616,7 +616,7 @@ ngx_http_lua_ngx_location_capture_multi(lua_State *L)
         ngx_array_destroy(extra_vars);
     }
 
-    ctx->no_abort = 1;
+    ctx->no_aborts++;
 
     return lua_yield(L, 0);
 }
@@ -987,7 +987,7 @@ ngx_http_lua_post_subrequest(ngx_http_request_t *r, void *data, ngx_int_t rc)
     if (pr_coctx->pending_subreqs == 0) {
         dd("all subrequests are done");
 
-        pr_ctx->no_abort = 0;
+        pr_ctx->no_aborts--;
         pr_ctx->resume_handler = ngx_http_lua_subrequest_resume;
         pr_ctx->cur_co_ctx = pr_coctx;
     }
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
index f7a537e..8d8d0f8 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
@@ -1411,8 +1411,8 @@ user_co_done:
 
                 dd("headers sent? %d", r->header_sent || ctx->header_sent);
 
-                if (ctx->no_abort) {
-                    ctx->no_abort = 0;
+                if (ctx->no_aborts) {
+                    ctx->no_aborts--;
                     return NGX_ERROR;
                 }
 
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
index 7dcc6f7..c5f2d20 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
@@ -245,7 +245,7 @@ void ngx_http_lua_cleanup_free(ngx_http_request_t *r,

 
 #define ngx_http_lua_check_if_abortable(L, ctx)                              \
-    if ((ctx)->no_abort) {                                                   \
+    if ((ctx)->no_aborts) {                                                   \
         return luaL_error(L, "attempt to abort with pending subrequests");   \
     }

yyqbuct avatar Feb 12 '25 07:02 yyqbuct

能否在 最新的 ngx_lua 版本上进行验证?

zhuizhuhaomeng avatar Feb 12 '25 10:02 zhuizhuhaomeng

能否在 最新的 ngx_lua 版本上进行验证?

可以,我抽空在最新的版本上也尝试复现下

yyqbuct avatar Feb 13 '25 02:02 yyqbuct

能否在 最新的 ngx_lua 版本上进行验证? @zhuizhuhaomeng 大佬有空帮看看,谢谢

结论

在最新的1.27.1.1能够复现

编译环境

Linux 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

编译命令

nginx version: openresty/1.27.1.1 built by gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) built with OpenSSL 3.4.1 11 Feb 2025 TLS SNI support enabled configure arguments: --prefix=/home/yyq/workspace/nginx/openresty-1.27.1.1-bin/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.27 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../rds-json-nginx-module-0.17 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.15 --with-ld-opt=-Wl,-rpath,/home/yyq/workspace/nginx/openresty-1.27.1.1-bin/luajit/lib --with-zlib=/home/yyq/workspace/nginx/zlib-1.2.13 --with-pcre=/home/yyq/workspace/nginx/pcre-8.45 --with-openssl=/home/yyq/workspace/nginx/openssl-3.4.1 --with-openssl-opt=-g --with-pcre-opt=-g --with-zlib-opt=-g --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module

复现过程

使用fortio工具以 100 QPS 请求http://localhost:8090/example0 或者 http://localhost:8090/example2

异常时调用栈

Image

复现时的nginx.conf


#user  nobody;
worker_processes  1;
daemon off;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  10240;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    server {
        listen       8090;

        location /internal/ {
            internal;
            rewrite ^/internal/(.*)$ /$1 break;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_pass http://127.0.0.1:9090;
        }

        # good
        location /example {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.say("example")
            }
        }

        location /stopgc {
            content_by_lua_block {
                collectgarbage("stop")
                ngx.say("stopgc")
            }
        }

        # bad
        location /example0 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end

                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1)
                ngx.say("example0")
                ngx.exit(200)
            }
        }

        # good
        location /example1 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1)
                ngx.thread.wait(t2)
                ngx.say("example1")
                ngx.exit(200)
            }
        }

        # bad
        location /example2 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1, t2)
                ngx.say("example2")
                ngx.exit(200)
            }
        }

        # good
        location /example3 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1, t2)
                ngx.say("example3")
            }
        }

        # good
        location /example4 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1)
                ngx.thread.wait(t2)
                ngx.say("example4")
                ngx.exit(200)
            }
        }

        # good
        location /example5 {
            content_by_lua_block {
                local func1 = function()
                    ngx.sleep(0.01)
                    ngx.say("t1: hello")
                    return "t1 done"
                end
                local func2 = function()
                    ngx.sleep(0.2)
                    ngx.say("t2: hello")
                    return "t2 done"
                end
                local t1 = ngx.thread.spawn(func1)
                local t2 = ngx.thread.spawn(func2)
                local ok, res = ngx.thread.wait(t1, t2)
                if ok then
                    ngx.say("status: ", res.status, ", body: ", res.body)
                else
                    ngx.say("not ok")
                end
                ngx.exit(ngx.OK)
            }
        }
    }
}

yyqbuct avatar Feb 13 '25 03:02 yyqbuct