luasocket icon indicating copy to clipboard operation
luasocket copied to clipboard

the unconnected:sendto method causes CPU-100%

Open mz198 opened this issue 5 years ago • 9 comments

Hi~ I found it the unconnected:sendto method will trap in infinite loop when the FD resource exhaust.

mz198 avatar Apr 15 '20 01:04 mz198

On which platform? Can you create a small self-contained test and post for us to see?

diegonehab avatar Apr 15 '20 01:04 diegonehab

Hi~ diegonehab ! Here is the test code on Linux platform:

local socket =  require "socket"

local reporter = {
    HOST = "127.0.0.1",
    PORT = 9500
} 

local array = {}
for i=1, 10000000 do
    print(i)
    array[i] = socket.udp()
    local content = string.format("EEE %d %d %d", 1, 1, 1)
    array[i]:sendto(content, reporter.HOST, reporter.PORT)
    print('finish!')
end

I use a table to keep the socket obj and never close it. Execution results:

1
finish!
2
finish!
3
finish!
......
28223
finish!
28224
finish!
28225
finish!
28226
finish!
28227

mz198 avatar Apr 15 '20 02:04 mz198

Trying here seems to go on forever. :/ Looking at the source code, I also can't see how this could possibly enter an infinite loop, unless getaddrinfo returns a cyclical linked list. Can you double-check?

finish!
1257645
finish!
1257646
finish!
1257647
finish!
1257648
finish!
1257649
finish!
1257650
finish!
1257651
finish!
1257652

diegonehab avatar Apr 15 '20 03:04 diegonehab

Hi~ diegonehab ! I found it the number of socket obj limited by the Linux network setting - net.ipv4.ip_local_port_range.

$ cat /proc/sys/net/ipv4/ip_local_port_range 
10000	60000

Now, I changed ip_local_port_range to 10000-60000. And the test code execution result:

49991
finish!
49992
finish!
49993
finish!
49994
finish!
49995
finish!
49996
finish!
49997

I think you can reproduce this problem by changing the net.ipv4.ip_local_port_range.

mz198 avatar Apr 15 '20 04:04 mz198

And here is the flame graph. image

mz198 avatar Apr 15 '20 05:04 mz198

Linux (Gentoo, kernel 5.4.15): Unable to reproduce any problem. The above script proceeds consistently and rapidly through the count space and finishes. My /proc/sys/net/ipv4/ip_local_port_range is set to 32768 60999.

MacOS (10.15.3 Catalina): Proceeds quickly up to about 16,000 and then slows down considerably and seemingly increasingly for each iteration. I was too impatient to let it complete.

ewestbrook avatar Apr 15 '20 14:04 ewestbrook

CentOS 7.2, kernel 3.10.0

mz198 avatar Apr 15 '20 15:04 mz198

Can you use a debugger to see which function is blocking forever? It could be the DNS lookup or it could be sendto itself. In that case, I’m not sure how we could do about it. But will keep thinking.

diegonehab avatar Apr 15 '20 16:04 diegonehab

Sorry, I use GDB to attach my test process. But I can’t get more informations about locals because of no debugging symbols found. I tried to set the breakpoint of some funcs, and found it the socket_sendto method was trapped in infinite loop because the <socket_sendto> breakpoint never hit.

(gdb) attach 22038
Attaching to process 22038
Reading symbols from /usr/local/openresty/luajit/bin/luajit...done.
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/lib/lua/5.1/socket/core.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/lua/5.1/socket/core.so
0x00007fba96750833 in __sendto_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install kong-1.4.0-1.x86_64
(gdb) b meth_sendto
Breakpoint 1 at 0x7fba96452920
(gdb) b socket_sendto
Breakpoint 2 at 0x7fba964531f0
(gdb) b socket_waitfd
Breakpoint 3 at 0x7fba96452e40
(gdb) b poll
Breakpoint 4 at 0x7fba96744e00
(gdb) b sendto
Breakpoint 5 at 0x7fba96750820
(gdb) s
Single stepping until exit from function sendto,
which has no line number information.
0x00007fba9645323b in socket_sendto () from /usr/local/lib/lua/5.1/socket/core.so
(gdb) s
Single stepping until exit from function socket_sendto,
which has no line number information.
......
(gdb) s
Single stepping until exit from function sendto,
which has no line number information.
0x00007fba9645323b in socket_sendto () from /usr/local/lib/lua/5.1/socket/core.so
(gdb) bt
#0  0x00007fba9645323b in socket_sendto () from /usr/local/lib/lua/5.1/socket/core.so
#1  0x00007fba96452a23 in meth_sendto () from /usr/local/lib/lua/5.1/socket/core.so
#2  0x0000000000420c9a in lj_BC_FUNCC ()
#3  0x000000000040f7b2 in lua_pcall (L=L@entry=0x7fba97323378, nargs=nargs@entry=0, nresults=<optimized out>, 
    errfunc=errfunc@entry=2) at lj_api.c:1130
#4  0x0000000000404d0c in docall (L=0x7fba97323378, narg=0, clear=0) at luajit.c:121
#5  0x00000000004059f1 in handle_script (argx=<optimized out>, L=0x7fba97323378) at luajit.c:292
#6  pmain (L=0x7fba97323378) at luajit.c:553
#7  0x0000000000420c9a in lj_BC_FUNCC ()
#8  0x000000000040f83b in lua_cpcall (L=L@entry=0x7fba97323378, func=func@entry=0x4052c0 <pmain>, ud=ud@entry=0x0)
    at lj_api.c:1154
#9  0x0000000000404784 in main (argc=2, argv=0x7ffd3b8b1d68) at luajit.c:582
(gdb) p errno
$1 = 11
(gdb) i breakpoint
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007fba96452920 <meth_sendto>
2       breakpoint     keep y   0x00007fba964531f0 <socket_sendto>
3       breakpoint     keep y   0x00007fba96452e40 <socket_waitfd>
	breakpoint already hit 41 times
4       breakpoint     keep y   0x00007fba96744e00 <poll>
	breakpoint already hit 41 times
5       breakpoint     keep y   0x00007fba96750820 <sendto>
	breakpoint already hit 41 times

The system call sendto() returns -1 and errno is set to 11(EAGAIN or EWOULDBLOCK). I guess that the port num reaches the upper limit causing the sendto failed(EAGAIN) but the fd is enough(IO_DONE), which results in an infinite loop. Is it possible?

./src/usocket.c:socket_send
for ( ;; ) {
        long put = (long) send(*ps, data, count, 0);
        /* if we sent anything, we are done */
        if (put >= 0) {
            *sent = put;
            return IO_DONE;
        }
        err = errno;
        /* EPIPE means the connection was closed */
        if (err == EPIPE) return IO_CLOSED;
        /* EPROTOTYPE means the connection is being closed (on Yosemite!)*/
        if (err == EPROTOTYPE) continue;
        /* we call was interrupted, just try again */
        if (err == EINTR) continue;
        /* if failed fatal reason, report error */
        if (err != EAGAIN) return err;
        /* wait until we can send something or we timeout */
        if ((err = socket_waitfd(ps, WAITFD_W, tm)) != IO_DONE) return err;
    }

And here is my env:

$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core) 
$ uname -r
3.10.0-514.21.1.el7.x86_64
$ luajit -v
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/

Best regards, THANKS.

mz198 avatar Apr 16 '20 05:04 mz198