signal 11 (core dumped) 3 mins after making a call
Bug Overview
when we make call to a flask app we get a core dump around 3 mins after, this is present in versions 1.32.2, 1.33.0, 1.34.0, 1.34.2 the call we are making makes a insert or a delete to a mysql database using mysql-python-connector
Expected Behavior
should not core dump
Steps to Reproduce the Bug
run application and make a post or delete call
Environment Details
- Target deployment platform: local server in a docker container
- Target OS: Ubuntu Desktop 22.04
- Version of this project or specific commit: 1.34.2
- Version of any relevant project languages: Python 3.11.2, Docker version 24.0.7, build afdd53b
listener.json config.json Dockerfile.txt docker-entrypoint.txt
Additional Context
debug log server.txt
HI, thanks for the report.
when we make call to a flask app we get a core dump around 3 mins after, this is present im versions 1.32.2, 1.33.0, 1.34.0, 1.34.2
Interesting, so IIUC, you can make a single request to the application, wait ~3mins and the the application process will crash?
Is this 100% the case and is it always after around 3 minutes?.
The good news is your getting a core dump (as I don't expect you have a handy reproducer).
Worked with coredumps before? I would really love a backtrace...
I'm going to assume you know where the coredumps are going, so next time you get one can you simply do
$ gdb /path/to/unitd /path/to/coredump
...
(gdb) bt
And paste the output. You may or may not get symbols displayed and you may need to install debuginfo packges for unit/python, but lets see what we get first.
Another quick check would be to see if the problem happens without threads, i.e. comment out "threads": 4, (or change it to 1) in the config.
HI, thanks for the report.
when we make call to a flask app we get a core dump around 3 mins after, this is present im versions 1.32.2, 1.33.0, 1.34.0, 1.34.2
Interesting, so IIUC, you can make a single request to the application, wait ~3mins and the the application process will crash?
Is this 100% the case and is it always after around 3 minutes?.
The good news is your getting a core dump (as I don't expect you have a handy reproducer).
Worked with coredumps before? I would really love a backtrace...
I'm going to assume you know where the coredumps are going, so next time you get one can you simply do
$ gdb /path/to/unitd /path/to/coredump ... (gdb) bt And paste the output. You may or may not get symbols displayed and you may need to install debuginfo packges for unit/python, but lets see what we get first.
we can reproduce the issue very easily. thou the core dump files don't seem to be getting saved anywhere.
let me know if you need more as i've not done this before
(gdb) bt
#0 0x00006356a3f4a093 in ?? ()
#1 0x00006356a3f4a1c9 in ?? ()
#2 0x00006356a3f3200e in nxt_event_engine_start ()
#3 0x00006356a3f2fcc6 in ?? ()
#4 0x00007997955671f5 in start_thread (arg=
also we tested the with 1 thread and the issue is still present
the flask server we are running is running an opanapi server just connecting to the swagger page causes the core dump. so not linked to doing a post.
2025/05/02 13:27:35 [info] 198#198 router started 2025/05/02 13:27:35 [info] 198#198 OpenSSL 3.0.15 3 Sep 2024, 300000f0 2025/05/02 13:27:35 [info] 199#199 "flask" prototype started 2025/05/02 13:27:35 [info] 200#200 "flask" application started 2025/05/02 13:27:36 [info] 226#226 "flask" application started 2025/05/02 13:27:36 [info] 252#252 "flask" application started 2025/05/02 13:27:36 [info] 278#278 "flask" application started 172.18.0.1 - - [02/May/2025:13:29:37 +0000] "GET /resource_manager/ui/ HTTP/1.1" 200 1498 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "0.032" 172.18.0.1 - - [02/May/2025:13:29:37 +0000] "GET /resource_manager/ui/swagger-ui.css HTTP/1.1" 200 143669 "https://localhost:20018/resource_manager/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "0.012" 172.18.0.1 - - [02/May/2025:13:29:37 +0000] "GET /resource_manager/ui/swagger-ui-bundle.js HTTP/1.1" 200 1091405 "https://localhost:20018/resource_manager/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "0.036" 172.18.0.1 - - [02/May/2025:13:29:38 +0000] "GET /resource_manager/ui/swagger-ui-standalone-preset.js HTTP/1.1" 200 337216 "https://localhost:20018/resource_manager/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "0.004" 172.18.0.1 - - [02/May/2025:13:29:38 +0000] "GET /resource_manager/ui/favicon-32x32.png HTTP/1.1" 200 628 "https://localhost:20018/resource_manager/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "0.004" 172.18.0.1 - - [02/May/2025:13:29:38 +0000] "GET /resource_manager/openapi.json HTTP/1.1" 200 55553 "https://localhost:20018/resource_manager/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "0.063" 2025/05/02 13:32:34 [alert] 7#7 process 198 exited on signal 11 (core dumped) 2025/05/02 13:32:34 [info] 336#336 router started 2025/05/02 13:32:34 [info] 336#336 OpenSSL 3.0.15 3 Sep 2024, 300000f0 2025/05/02 13:32:34 [info] 345#345 "flask" prototype started 2025/05/02 13:32:34 [info] 346#346 "flask" application started 2025/05/02 13:32:34 [notice] 199#199 app process 252 exited with code 0 2025/05/02 13:32:34 [notice] 199#199 app process 278 exited with code 0 2025/05/02 13:32:34 [notice] 199#199 app process 200 exited with code 0 2025/05/02 13:32:34 [notice] 199#199 app process 226 exited with code 0 2025/05/02 13:32:34 [notice] 7#7 process 199 exited with code 0 2025/05/02 13:32:34 [info] 372#372 "flask" application started 2025/05/02 13:32:34 [info] 398#398 "flask" application started 2025/05/02 13:32:35 [info] 424#424 "flask" application started
Thanks for the backtrace and the log output.
It actually looks like it's the router process that's crashing and not the application process.
Unfortunately as feared you are missing most of the debug symbols. Lets see if we can fix that
Assuming you installed from packages, you should be able to get debuginfo by installing the unit-dbg & unit-python-3.11-dbg (just in case) packages.
Then (and you can probably use the same coredump) could you provide the output from both the bt & bt full gdb commands? Thanks.
i'n getting the following when install unit-python-3.11-dbg
root@ee67afd20b2b:/tmp# apt install unit-python-3.11-dbg Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package unit-python-3.11-dbg E: Couldn't find any package by glob 'unit-python-3.11-dbg'
here is the backtrace from just installing unit-dbg
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `unit: router '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 nxt_h1p_complete_buffers (task=task@entry=0x72a9e4001620, h1p=h1p@entry=0x6255219bc340, all=all@entry=1) at src/nxt_h1proto.c:1522
1522 src/nxt_h1proto.c: No such file or directory.
[Current thread is 1 (Thread 0x72a9ea5266c0 (LWP 305))]
(gdb) bt
#0 nxt_h1p_complete_buffers (task=task@entry=0x72a9e4001620, h1p=h1p@entry=0x6255219bc340, all=all@entry=1) at src/nxt_h1proto.c:1522
#1 0x000062550e4cb46f in nxt_h1p_shutdown (task=0x72a9e4001620, c=0x72a9e4001550) at src/nxt_h1proto.c:2129
#2 0x000062550e4b75c2 in nxt_event_engine_start (engine=0x6255219bc0b0) at src/nxt_event_engine.c:542
#3 0x000062550e4b5bf1 in nxt_thread_trampoline (data=0x6255219339e0) at src/nxt_thread.c:126
#4 0x000072a9ebb621f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x000072a9ebbe1b00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0 nxt_h1p_complete_buffers (task=task@entry=0x72a9e4001620, h1p=h1p@entry=0x6255219bc340, all=all@entry=1) at src/nxt_h1proto.c:1522
size = <optimized out>
b = 0x0
in = <optimized out>
next = <optimized out>
c = 0x0
#1 0x000062550e4cb46f in nxt_h1p_shutdown (task=0x72a9e4001620, c=0x72a9e4001550) at src/nxt_h1proto.c:2129
timer = <optimized out>
h1p = 0x6255219bc340
#2 0x000062550e4b75c2 in nxt_event_engine_start (engine=0x6255219bc0b0) at src/nxt_event_engine.c:542
obj = 0x72a9e40015b8
data = 0x0
task = 0x72a9e4001620
timeout = <optimized out>
now = <optimized out>
thr = <optimized out>
handler = 0x62550e4b7a80 <nxt_timer_handler>
#3 0x000062550e4b5bf1 in nxt_thread_trampoline (data=0x6255219339e0) at src/nxt_thread.c:126
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {126074106308288, 255495596939513932, -1368, 11, 140734449295984, 126074097917952, -1812585321242455988, -4097801277494816692}, __mask_was_saved = 0}}, __pad = {0x72a9ea525a30, 0x0, 0x0, 0x0}}
__cancel_routine = 0x62550e4b5a60 <nxt_thread_time_cleanup>
__cancel_arg = <optimized out>
__not_first_call = <optimized out>
thr = <optimized out>
link = 0x6255219339e0
start = 0x62550e4c0c90 <nxt_router_thread_start>
#4 0x000072a9ebb621f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#5 0x000072a9ebbe1b00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
i'n getting the following when install unit-python-3.11-dbg
root@ee67afd20b2b:/tmp# apt install unit-python-3.11-dbg Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package unit-python-3.11-dbg E: Couldn't find any package by glob 'unit-python-3.11-dbg'
fixed this the issue was the name its python3.11-dbg
bug trace with both unit-dbg & unit-python3.11-dbg installed
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `unit: router '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 nxt_h1p_complete_buffers (task=task@entry=0x7a1904001620, h1p=h1p@entry=0x636b874f99f0, all=all@entry=1) at src/nxt_h1proto.c:1522
1522 src/nxt_h1proto.c: No such file or directory.
[Current thread is 1 (Thread 0x7a1913fff6c0 (LWP 306))]
(gdb) bt
#0 nxt_h1p_complete_buffers (task=task@entry=0x7a1904001620, h1p=h1p@entry=0x636b874f99f0, all=all@entry=1) at src/nxt_h1proto.c:1522
#1 0x0000636b7063246f in nxt_h1p_shutdown (task=0x7a1904001620, c=0x7a1904001550) at src/nxt_h1proto.c:2129
#2 0x0000636b7061e5c2 in nxt_event_engine_start (engine=0x636b874f9760) at src/nxt_event_engine.c:542
#3 0x0000636b7061cbf1 in nxt_thread_trampoline (data=0x636b87467350) at src/nxt_thread.c:126
#4 0x00007a191c65c1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007a191c6dbb00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0 nxt_h1p_complete_buffers (task=task@entry=0x7a1904001620, h1p=h1p@entry=0x636b874f99f0, all=all@entry=1) at src/nxt_h1proto.c:1522
size = <optimized out>
b = 0x0
in = <optimized out>
next = <optimized out>
c = 0x0
#1 0x0000636b7063246f in nxt_h1p_shutdown (task=0x7a1904001620, c=0x7a1904001550) at src/nxt_h1proto.c:2129
timer = <optimized out>
h1p = 0x636b874f99f0
#2 0x0000636b7061e5c2 in nxt_event_engine_start (engine=0x636b874f9760) at src/nxt_event_engine.c:542
obj = 0x7a19040015b8
data = 0x0
task = 0x7a1904001620
timeout = <optimized out>
now = <optimized out>
thr = <optimized out>
handler = 0x636b7061ea80 <nxt_timer_handler>
#3 0x0000636b7061cbf1 in nxt_thread_trampoline (data=0x636b87467350) at src/nxt_thread.c:126
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {134248128313024, 5039262364208667989, -1368, 11, 140726245626752, 134248119922688, -5630298242980307627, -8990892774217671339}, __mask_was_saved = 0}}, __pad = {0x7a1913ffea30, 0x0, 0x0, 0x0}}
__cancel_routine = 0x636b7061ca60 <nxt_thread_time_cleanup>
__cancel_arg = <optimized out>
__not_first_call = <optimized out>
thr = <optimized out>
link = 0x636b87467350
start = 0x636b70627c90 <nxt_router_thread_start>
#4 0x00007a191c65c1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#5 0x00007a191c6dbb00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Thanks.
#0 nxt_h1p_complete_buffers (task=task@entry=0x7a1904001620, h1p=h1p@entry=0x636b874f99f0, all=all@entry=1) at src/nxt_h1proto.c:1522
size = <optimized out>
b = 0x0
in = <optimized out>
next = <optimized out>
c = 0x0
c is NULL here which we're not expecting.
If you are able to, you could try this patch and see if anything else falls out...
diff --git ./src/nxt_h1proto.c ./src/nxt_h1proto.c
index 9a9ad553..d0e03077 100644
--- ./src/nxt_h1proto.c
+++ ./src/nxt_h1proto.c
@@ -1519,7 +1519,7 @@ nxt_h1p_complete_buffers(nxt_task_t *task, nxt_h1proto_t *h1p, nxt_bool_t all)
b = h1p->buffers;
c = h1p->conn;
- in = c->read;
+ in = c ? c->read : NULL;
if (b != NULL) {
if (in == NULL) {
Just to confirm again.
If you hit the problematic page, then after about 3 minutes, Unit crashes?
It's only that one page that causes the crash?
If you leave the page loaded in the browse, does it survive?r
we make a call to the server this could come from a python process using python requests, but in our tests here we are using a browser. after making the call after about 3 minutes we get the crash the webpage looks fine and we can continue making requests, if we refresh the page we get another crash after 3 mins.
ok I tested the patch and still have the same issue.
latest backtrace
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `unit: router '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 nxt_h1p_complete_buffers (task=task@entry=0x732064001620, h1p=h1p@entry=0x584bc2be89c0, all=all@entry=1) at src/nxt_h1proto.c:1522
1522 src/nxt_h1proto.c: No such file or directory.
[Current thread is 1 (Thread 0x73206b5b06c0 (LWP 297))]
(gdb) bt
#0 nxt_h1p_complete_buffers (task=task@entry=0x732064001620, h1p=h1p@entry=0x584bc2be89c0, all=all@entry=1) at src/nxt_h1proto.c:1522
#1 0x0000584ba231b46f in nxt_h1p_shutdown (task=0x732064001620, c=0x732064001550) at src/nxt_h1proto.c:2129
#2 0x0000584ba23075c2 in nxt_event_engine_start (engine=0x584bc2be8730) at src/nxt_event_engine.c:542
#3 0x0000584ba2305bf1 in nxt_thread_trampoline (data=0x584bc2b6a630) at src/nxt_thread.c:126
#4 0x000073206c3eb1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x000073206c46ab00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0 nxt_h1p_complete_buffers (task=task@entry=0x732064001620, h1p=h1p@entry=0x584bc2be89c0, all=all@entry=1) at src/nxt_h1proto.c:1522
size = <optimized out>
b = 0x0
in = <optimized out>
next = <optimized out>
c = 0x0
#1 0x0000584ba231b46f in nxt_h1p_shutdown (task=0x732064001620, c=0x732064001550) at src/nxt_h1proto.c:2129
timer = <optimized out>
h1p = 0x584bc2be89c0
#2 0x0000584ba23075c2 in nxt_event_engine_start (engine=0x584bc2be8730) at src/nxt_event_engine.c:542
obj = 0x7320640015b8
data = 0x0
task = 0x732064001620
timeout = <optimized out>
now = <optimized out>
thr = <optimized out>
handler = 0x584ba2307a80 <nxt_timer_handler>
#3 0x0000584ba2305bf1 in nxt_thread_trampoline (data=0x584bc2b6a630) at src/nxt_thread.c:126
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {126583077275328, 1042248611634940760, -1368, 11, 140720990601120, 126583068884992, -1714154970403239080, -4692315440823465128}, __mask_was_saved = 0}}, __pad = {0x73206b5afa30, 0x0, 0x0, 0x0}}
__cancel_routine = 0x584ba2305a60 <nxt_thread_time_cleanup>
__cancel_arg = <optimized out>
__not_first_call = <optimized out>
thr = <optimized out>
link = 0x584bc2b6a630
start = 0x584ba2310c90 <nxt_router_thread_start>
#4 0x000073206c3eb1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#5 0x000073206c46ab00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Weird. Are you absolutely sure you tested the patched version?
With that patch, in this case, b & in should both then be NULL. We should then just skip past both the outer if()'s, however the key bit being that if c is NULL we won't attempt to read c->read.
Are websockets involved?
Actually, would it be possible for you test current master?
There is a remote possibility it may make a difference... (there is some h1proto related changes...)
so it seems my patched version was being replaced when building the image. I have fixed this and I do still get a core dump but its different
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `unit: router '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005ca23e878d53 in nxt_rbtree_branch_min (node=0x4, tree=<optimized out>) at src/nxt_rbtree.h:70
70 src/nxt_rbtree.h: No such file or directory.
[Current thread is 1 (Thread 0x7ba386edf6c0 (LWP 297))]
(gdb) bt
#0 0x00005ca23e878d53 in nxt_rbtree_branch_min (node=0x4, tree=<optimized out>) at src/nxt_rbtree.h:70
#1 nxt_rbtree_delete (tree=tree@entry=0x5ca27462b160, part=part@entry=0x7ba380001618) at src/nxt_rbtree.c:305
#2 0x00005ca23e84dbc9 in nxt_timer_changes_commit (engine=0x5ca27462af30) at src/nxt_timer.c:201
#3 0x00005ca23e84df38 in nxt_timer_find (engine=engine@entry=0x5ca27462af30) at src/nxt_timer.c:241
#4 0x00005ca23e84d7cc in nxt_event_engine_start (engine=0x5ca27462af30) at src/nxt_event_engine.c:547
#5 0x00005ca23e84bdc1 in nxt_thread_trampoline (data=0x5ca2745ad570) at src/nxt_thread.c:126
#6 0x00007ba387d1c1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007ba387d9bb00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0 0x00005ca23e878d53 in nxt_rbtree_branch_min (node=0x4, tree=<optimized out>) at src/nxt_rbtree.h:70
No locals.
#1 nxt_rbtree_delete (tree=tree@entry=0x5ca27462b160, part=part@entry=0x7ba380001618) at src/nxt_rbtree.c:305
color = <optimized out>
node = 0x7ba380001618
sentinel = 0x5ca27462b160
subst = 0x7ba380001618
child = <optimized out>
#2 0x00005ca23e84dbc9 in nxt_timer_changes_commit (engine=0x5ca27462af30) at src/nxt_timer.c:201
timer = 0x7ba380001618
timers = 0x5ca27462b160
ch = 0x5ca274633dd0
end = 0x5ca274633de0
add = 0x5ca274633dd0
add_end = 0x5ca274633de0
#3 0x00005ca23e84df38 in nxt_timer_find (engine=engine@entry=0x5ca27462af30) at src/nxt_timer.c:241
delta = <optimized out>
time = <optimized out>
timer = <optimized out>
timers = 0x5ca27462b160
tree = <optimized out>
node = <optimized out>
next = <optimized out>
#4 0x00005ca23e84d7cc in nxt_event_engine_start (engine=0x5ca27462af30) at src/nxt_event_engine.c:547
obj = 0x7ba380001550
data = 0x5ca27462af30
task = 0x0
timeout = <optimized out>
now = <optimized out>
thr = <optimized out>
handler = <optimized out>
#5 0x00005ca23e84bdc1 in nxt_thread_trampoline (data=0x5ca2745ad570) at src/nxt_thread.c:126
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {135942273627840, -165853751974725296, -688, 11, 140736538951552, 135942265237504,
789758586193541456, 4969363667880999248}, __mask_was_saved = 0}}, __pad = {0x7ba386edecf0, 0x0, 0x0, 0x0}}
__cancel_routine = 0x5ca23e84bc30 <nxt_thread_time_cleanup>
__cancel_arg = <optimized out>
__not_first_call = <optimized out>
thr = <optimized out>
link = 0x5ca2745ad570
start = 0x5ca23e856e60 <nxt_router_thread_start>
#6 0x00007ba387d1c1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#7 0x00007ba387d9bb00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Actually, would it be possible for you test current master?
There is a remote possibility it may make a difference... (there is some h1proto related changes...)
I'll try this later. did you want me to include the patch? I assume I will need to build the python module as well?
just tested on master branch same issue, this was including the patch
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `unit: router '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005a96708f5403 in nxt_rbtree_branch_min (node=0x4, tree=<optimized out>) at src/nxt_rbtree.h:70
70 src/nxt_rbtree.h: No such file or directory.
[Current thread is 1 (Thread 0x7e0dbffff6c0 (LWP 301))]
(gdb) bt
#0 0x00005a96708f5403 in nxt_rbtree_branch_min (node=0x4, tree=<optimized out>) at src/nxt_rbtree.h:70
#1 nxt_rbtree_delete (tree=tree@entry=0x5a9687477d30, part=part@entry=0x7e0db4001618) at src/nxt_rbtree.c:305
#2 0x00005a96708c8ae9 in nxt_timer_changes_commit (engine=0x5a9687477b00) at src/nxt_timer.c:201
#3 0x00005a96708c8e58 in nxt_timer_find (engine=engine@entry=0x5a9687477b00) at src/nxt_timer.c:241
#4 0x00005a96708c86ec in nxt_event_engine_start (engine=0x5a9687477b00) at src/nxt_event_engine.c:547
#5 0x00005a96708c6ce1 in nxt_thread_trampoline (data=0x5a96873ddec0) at src/nxt_thread.c:126
#6 0x00007e0dc906f1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007e0dc90eeb00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0 0x00005a96708f5403 in nxt_rbtree_branch_min (node=0x4, tree=<optimized out>) at src/nxt_rbtree.h:70
No locals.
#1 nxt_rbtree_delete (tree=tree@entry=0x5a9687477d30, part=part@entry=0x7e0db4001618) at src/nxt_rbtree.c:305
color = <optimized out>
node = 0x7e0db4001618
sentinel = 0x5a9687477d30
subst = 0x7e0db4001618
child = <optimized out>
#2 0x00005a96708c8ae9 in nxt_timer_changes_commit (engine=0x5a9687477b00) at src/nxt_timer.c:201
timer = 0x7e0db4001618
timers = 0x5a9687477d30
ch = 0x5a96874809a0
end = 0x5a96874809b0
add = 0x5a96874809a0
add_end = 0x5a96874809b0
#3 0x00005a96708c8e58 in nxt_timer_find (engine=engine@entry=0x5a9687477b00) at src/nxt_timer.c:241
delta = <optimized out>
time = <optimized out>
timer = <optimized out>
timers = 0x5a9687477d30
tree = <optimized out>
node = <optimized out>
next = <optimized out>
#4 0x00005a96708c86ec in nxt_event_engine_start (engine=0x5a9687477b00) at src/nxt_event_engine.c:547
obj = 0x7e0db4001550
data = 0x5a9687477b00
task = 0x0
timeout = <optimized out>
now = <optimized out>
thr = <optimized out>
handler = <optimized out>
#5 0x00005a96708c6ce1 in nxt_thread_trampoline (data=0x5a96873ddec0) at src/nxt_thread.c:126
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {138597520897728, -7122281061378609134, -1376, 11, 140722096742176, 138597512507392,
7004202308968460306, 2883555026879328274}, __mask_was_saved = 0}}, __pad = {0x7e0dbfffea30, 0x0, 0x0, 0x0}}
__cancel_routine = 0x5a96708c6b50 <nxt_thread_time_cleanup>
__cancel_arg = <optimized out>
__not_first_call = <optimized out>
thr = <optimized out>
link = 0x5a96873ddec0
start = 0x5a96708d1d80 <nxt_router_thread_start>
#6 0x00007e0dc906f1f5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#7 0x00007e0dc90eeb00 in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
Thanks for all the testing.
The only 3 minutes thing I can immediately see in Unit is the idle_timeout setting...
Maximum number of seconds between requests in a keep-alive connection. If no new requests arrive within this interval, Unit returns a 408 “Request Timeout” response and closes the connection.
So it looks like we are hitting this and then trying to shutdown the connection (which matches the location of the crash).
Looks like I'll need to try and figure out how to reproduce this.
One last question for now, could you paste the headers being sent by the browser?
I'll see about uploading sone files so you can build a image with the issue in.
thou is there currently anything we can do to stop this happening so much?
I'll see about uploading sone files so you can build a image with the issue in.
Thanks!
thou is there currently anything we can do to stop this happening so much?
Unfortunately it seems not.
Though there is one thing you could test, which might help narrow the problem down, try adding the following to your settings.http unit config
"idle_timeout": 60
See if that makes the crashes happen after only a minute.
Yes changing that setting does make it core dump in just over 60 seconds
OK, thanks for testing, so it certainly seems something related to keep-alive requests, I'm just surprised we haven't seen this before...
So in an attempt to reproduce this in the meantime...
$ telnet localhost 8000
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost:8000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:138.0) Gecko/20100101 Firefox/138.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Priority: u=0, i
HTTP/1.1 200 OK
content-type: text/plain
Server: Unit/1.34.2
Date: Tue, 06 May 2025 15:32:44 GMT
Transfer-Encoding: chunked
b
Testing...
0
/*
* After 3 minutes
*/
HTTP/1.1 408 Request Timeout
Server: Unit/1.34.2
Connection: close
Content-Length: 0
Date: Tue, 06 May 2025 15:35:44 GMT
Connection closed by foreign host.
So things seem to have worked as expected, After 3 minutes Unit terminated the connection with a 408.
I'll keep prodding...
the attached zip file contains a project that is able to recreate the issue every time
extract into a folder then run the following make build-server make build-image make deploy-server
this does require docker and will result in a container running in port 443, in a browser go to https://server-ip this should end up displaying a openapi webpage, then wait 60 seconds. this also has you patch included
Thanks, let me see if I can get it going without docker...
Quick question, where's the actual application that's being run?
so the make build-server runs the open api generator which creates all the server code once you run it you will see a folder called server with all that code in it, you can run this outside of docker thou you will need the openapi generator jar file
https://repo1.maven.org/maven2/org/openapitools/openapi-generator-cli/7.0.1/openapi-generator-cli-7.0.1.jar
then run
java -jar /openapi-generator-cli-7.0.1.jar generate \
-t .openapi-generator-server/ \
-i openapi.yaml\
-g python-flask \
-o server/
OK, thanks, so after a considerable amount of python packages later! I have it running.
Anything specific I need to do?
basically just open it in a browser
https://localhost
this should redirect to https://localhost/openapi_examples/ui/
then just wait