ip2proxy-nginx icon indicating copy to clipboard operation
ip2proxy-nginx copied to clipboard

Segfaults on nginx 1.18+ and 100% cpu system usage with ip2proxy

Open asyslinux opened this issue 2 years ago • 14 comments

Hello, on middle loaded nginx 1.18+ server (2000-4000+ rps and proxy_pass to http backends) with compiled ip2location and ip2proxy:

I checked 1.18.0 and 1.20.1 versions of nginx with this bugs (nginx flags same on 1.16.1, 1.18.0, 1.20.1).

OS: Debian 9 x86_64 Stretch Compiled nginx flags:

nginx version: nginx/1.18.0 built by gcc 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --add-module=/root/compile/ip2mod-location --add-module=/root/compile/ip2mod-proxy --with-cc-opt='-g -O2 -fdebug-prefix-map=/root/compile/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

I use latest libraries for building ip2location/ip2proxy with nginx:

https://github.com/chrislim2888/IP2Location-C-Library/archive/master.zip https://github.com/ip2location/ip2proxy-c/archive/master.zip https://github.com/ip2location/ip2location-nginx/archive/master.zip https://github.com/ip2location/ip2proxy-nginx/archive/master.zip

Dmesg errors:

[Mon Oct 25 05:25:21 2021] show_signal_msg: 66 callbacks suppressed [Mon Oct 25 05:25:21 2021] nginx[12799]: segfault at 55ef2ac5d359 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12831]: segfault at 55ef2ac5d37d ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12814]: segfault at 55ef2ac5d331 ip 00007f9cae118c77 sp 00007ffdcd1c37c0 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] traps: nginx[12821] trap invalid opcode ip:7f9cae118c76 sp:7ffdcd1c3800 error:0 [Mon Oct 25 05:25:21 2021] in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12838]: segfault at 55ef2ac5d3f0 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] traps: nginx[12859] trap invalid opcode ip:7f9cae118c76 sp:7ffdcd1c3800 error:0 [Mon Oct 25 05:25:21 2021] in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12864]: segfault at 55ef2ac5d3b0 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12884]: segfault at 55ef2ac5d335 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12883]: segfault at 55ef2ac5d37b ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12905]: segfault at 55ef2ac5d3d8 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000]

And this is not all, also have a strange trouble when nginx workers consume 100% with system usage(in the top), and in error logs of nginx in this moment i have a tons of messages like this:

2021/10/28 22:41:06 [error] 34416#34416: *51265 upstream prematurely closed connection while reading response header from upstream.

But upstream is ok - this is very quick upstream written on Go language based on Iris and in case of nginx 1.16.1 no have any problems.

strace connected to any 100% cpu usage nginx worker shows lseek method in infinity loop to only one descriptor like this:

lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488

But with 1.16.1 version no have any of this problems.

But I cannot use more newer nginx with ip2location and ip2proxy library :(

I don`t know ip2location library have any problems or not with nginx 1.18+, but ip2proxy have.

asyslinux avatar Oct 29 '21 04:10 asyslinux

May I know what configuration you use to load the database?

ip2location avatar Oct 29 '21 04:10 ip2location

Nginx settings:

user www-data www-data;
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 16384;

error_log /var/log/nginx/error.log warn;

pid /var/run/nginx.pid;

events {
        use epoll;
        worker_connections 16384;
        multi_accept on;
}

Only two directives:

http {

        ip2location_database /etc/ip2location/IPV6-COUNTRY.BIN;
        ip2proxy_database /etc/ip2proxy/IP2PROXY-IP-PROXYTYPE-COUNTRY.BIN;

        proxy_redirect off;
        proxy_max_temp_file_size                0;

        proxy_connect_timeout      1;
        proxy_send_timeout         30;
        proxy_read_timeout         60;

        sendfile                                on;
        tcp_nopush                              on;
        tcp_nodelay                             on;
        lingering_time                          30;
        lingering_timeout                       5;
        client_header_timeout                   240;
        client_body_timeout                     240;
        send_timeout                            120;
        keepalive_timeout                       300;
        reset_timedout_connection               on;
        server_tokens                           off;
        charset                                 utf-8;

        client_max_body_size                    1024m; #Not used, but configured large size
        client_body_buffer_size                 256k;

Upstream esd used for small POST JSON Data. And this is configuration for upstream(with keepalive or without not necessary, same result):

upstream esd {
               server 127.0.0.1:1234;
               keepalive 128;
}

location /asea {

                proxy_set_header Host domain.com;

                proxy_http_version 1.1;
                proxy_set_header Connection "";

                proxy_set_header X-GEOIP-COUNTRY-CODE $ip2location_country_short;
                proxy_set_header X-PROXY $ip2proxy_is_proxy;
                proxy_set_header X-PROXY-TYPE $ip2proxy_proxy_type;

                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-Proto https;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

                proxy_pass http://esd;

}

asyslinux avatar Oct 29 '21 04:10 asyslinux

Correction. Now I use latest ip2proxy/ip2location with Nginx 1.16.1 (because i recompile from zero) and have some times infinity lseek too with 100% system cpu usage by nginx workers.

lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696

Tomorrow I try to build nginx 1.16.1 with old version(around 1+ year ago) of ip2proxy/ip2location, and test again...

asyslinux avatar Oct 29 '21 16:10 asyslinux

I successfully build nginx 1.16.1 with old version of ip2location and ip2proxy, this commits from 2020 year:

wget https://github.com/chrislim2888/IP2Location-C-Library/archive/9a99a06a99bffbc5779d64ab62a9cef839ef6a63.zip wget https://github.com/ip2location/ip2proxy-c/archive/436bd401f49a94cf706192ae5e62bc72cfef799c.zip wget https://github.com/ip2location/ip2location-nginx/archive/08ec371b7dd188c5b727b2533a39ec3988a7c1ad.zip wget https://github.com/ip2location/ip2proxy-nginx/archive/8b378603dfb0190a6c0d06c17b9ec6c77014eebc.zip

Before, I used access_type shared_memory option for ip2location and ip2proxy, and now I changed nginx config back to:


        ip2location_database /etc/ip2location/IPV6-COUNTRY.BIN;
        ip2location_access_type shared_memory;

        ip2proxy_database /etc/ip2proxy/IP2PROXY-IP-PROXYTYPE-COUNTRY.BIN;
        ip2proxy_access_type shared_memory;

Now I continue test with old versions of libraries and with using shared_memory access_type early.

asyslinux avatar Oct 30 '21 08:10 asyslinux

Same problem, 100% CPU usage by workers with lseek. With shared_memory nginx more quickly reach this bad condition. Can newer databases affect this? Now I don`t know how to fix this. Build version from 2019, I think this is not help.

Current status:

I installed latest version of libraries with nginx 1.16.1, now this setup shows minimal problems.

asyslinux avatar Oct 30 '21 09:10 asyslinux

Can I check if you changed this line 352 to the below and test if works?

IP2Proxy_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);

chrislim2888 avatar Oct 31 '21 01:10 chrislim2888

Can I check if you changed this line 352 to the below and test if works?

IP2Proxy_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);

By default I use nginx and latest version libraries with: IP2Proxy_open_mem(gcf->handler, IP2PROXY_CACHE_MEMORY); IP2Location_open_mem(gcf->handler, IP2LOCATION_CACHE_MEMORY);

But now I recompile nginx and latest version libraries with:

IP2Proxy_open_mem(gcf->handler, IP2PROXY_SHARED_MEMORY); IP2Location_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);

I think better compile two libraries with single memory method.

Please wait a little, I look how it works.

asyslinux avatar Oct 31 '21 05:10 asyslinux

Nginx 1.16.1 with:

IP2Proxy_open_mem(gcf->handler, IP2PROXY_SHARED_MEMORY); IP2Location_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);

Same result with 100% cpu consumption(by system usage), dmesg not shown any abnormal segfaults with libIP2Proxy.so.2.0.0 (segfaults I look only on 1.18+ nginx). May be 100% cpu consumption is not problem with ip2proxy or ip2location library, but i can not disable ip2proxy and ip2location and check, because our service depends on this libraries, and without libraries can not work.

I look at google and can`t find any other problems with clear nginx, like current problem in my case.

I attach screenshot from top:

top

Now strace on any 100% cpu nginx workers shows other result, not a simple loop with single SEEK_SET:

read(263, "", 4096) = 0 lseek(263, 3728, SEEK_CUR) = 1056400 read(263, "", 4096) = 0 lseek(263, 2249146368, SEEK_SET) = 2249146368 read(263, "", 3740) = 0 lseek(263, 3740, SEEK_CUR) = 3853258072 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 4096) = 4096 lseek(263, 531161088, SEEK_SET) = 531161088 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 4096) = 4096 lseek(263, 531165184, SEEK_SET) = 531165184 lseek(263, 3967135744, SEEK_SET) = 3967135744 read(263, "", 4096) = 0 lseek(263, 1268, SEEK_CUR) = 1053940 read(263, "", 4096) = 0 lseek(263, 3967135744, SEEK_SET) = 3967135744 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 1280) = 1280 read(263, "\304\367\240\5\320\350\240\5\331\207\0\1\312\367\240\5\227\365\240\5\332\207\0\1\304\367\240\5\320\350\240\5"..., 4096) = 4096 lseek(263, 1390153728, SEEK_SET) = 1390153728 read(263, "", 4096) = 0 lseek(263, 2912, SEEK_CUR) = 1055584 read(263, "\304\367\240\5\320\350\240\5\331\207\0\1\312\367\240\5\227\365\240\5\332\207\0\1\304\367\240\5\320\350\240\5"..., 4096) = 4096 lseek(263, 1390153728, SEEK_SET) = 1390153728 read(263, "", 4096) = 0 lseek(263, 2924, SEEK_CUR) = 1055596 read(263, "", 4096) = 0 lseek(263, 2249146368, SEEK_SET) = 2249146368 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 3728) = 3728 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 4096) = 4096 lseek(263, 2249154192, SEEK_SET) = 2249154192 lseek(263, 531161088, SEEK_SET) = 531161088 read(263, "", 4096) = 0 lseek(263, 2088, SEEK_CUR) = 1054760 read(263, "\304\367\240\5\320\350\240\5\331\207\0\1\312\367\240\5\227\365\240\5\332\207\0\1\304\367\240\5\320\350\240\5"..., 4096) = 4096 lseek(263, 531161088, SEEK_SET) = 531161088

asyslinux avatar Oct 31 '21 16:10 asyslinux

I think, and i try to build a test environment with different nginx versions + libraries ip2proxy/ip2location, backend daemon(in testing environment simply save data to /dev/null) and with a client coroutine simulation of real requests with generated POST data. Then we can test nginx without ip2proxy/ip2location libraries too.

I want to reproduce this problems(segfaults and 100% system cpu usage by nginx) in test environment, then You can download this ready environment and test with debug byself, because I can`t find and fix that bugs byself. Please wait around 1-2 weeks.

Now I simply restarts nginx on my production servers when cpu system usage > 50%.

asyslinux avatar Nov 03 '21 08:11 asyslinux

Several days ago I built a new nginx from packages.sury.org repository - Nginx 1.20.1 and latest ip2proxy/ip2location (CACHE_MEMORY default edition) libraries:

With this nginx I dont have any segfaults. But trouble with periodically 100% cpu system usage with lseek loop on single descriptor exists. I think that problem I had earlier, just didn't pay attention before, and we had less traffic, and our monitoring automatically restarts nginx if cant connect, without any notifications.

I also disabled:

proxy_buffering off; proxy_request_buffering off;

and enabled:

proxy_ignore_client_abort on;

But result same, periodically 100% system cpu usage.

Ok, I continue make a test environment and try to repeat synthetically that problem.

This is current build without any segfaults:

nginx -V nginx version: nginx/1.20.1 built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/root/compile/nginx-1.20.1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_flv_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_perl_module=dynamic --with-http_random_index_module --with-http_secure_link_module --with-http_sub_module --with-http_xslt_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-stream=dynamic --with-stream_geoip_module=dynamic --with-stream_ssl_module --with-stream_ssl_preread_module --add-module=/root/compile/ip2mod-location --add-module=/root/compile/ip2mod-proxy --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/brotli --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-headers-more-filter --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-auth-pam --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-cache-purge --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-dav-ext --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-ndk --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-echo --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-fancyindex --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-geoip2 --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/nchan --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-lua --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/rtmp --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-uploadprogress --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-upstream-fair --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-subs-filter --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/ssl-ct

asyslinux avatar Nov 11 '21 13:11 asyslinux

Can you please try recompiling with the latest IP2Location-C library without IP2Proxy?

ip2location avatar Nov 11 '21 22:11 ip2location

Can you please try recompiling with the latest IP2Location-C library without IP2Proxy?

Yes, I try little later

asyslinux avatar Nov 15 '21 18:11 asyslinux

Hello. I found what was the problem. It's funny, but I had a slightly incorrectly written script that periodically downloaded and updated the ip2location and ip2proxy databases. In other words, sometimes the file with the database was accidentally cut off by several kilobytes or even by 1-2 megabytes. As a result, Nginx starts up with broken database files without problems. But then the problems that I described above may begin.

Ideally, Nginx should simply not start up with broken IP database files. It would probably be nice to have such a check at startup.

Please close this issue.

asyslinux avatar Jan 20 '22 20:01 asyslinux

After long time:

  1. Problem with segfaults it is not a problem with IP2 libraries. This is i see only on 1 server, and after recompile and reinstall i don`t be seen this problem.

  2. Problem with 100% system CPU usage on some workers - this is problem i see only with IP2 libraries compiled with Nginx. I tried old versions, shared memory etc.

Shared and cache memory have difference only in strace messages, but strace shows unlimited lseek in all cases.

This is same issue here: https://github.com/ip2location/ip2location-nginx/issues/13 But in my case i have very loaded nginx and see falling workers to 100% sys cpu usage by random nginx workers not only after nginx reload, but during work too.

asyslinux avatar Oct 08 '22 10:10 asyslinux

Kindly try it again with the latest version.

ip2location avatar Nov 09 '22 09:11 ip2location