ip2proxy-nginx
ip2proxy-nginx copied to clipboard
Segfaults on nginx 1.18+ and 100% cpu system usage with ip2proxy
Hello, on middle loaded nginx 1.18+ server (2000-4000+ rps and proxy_pass to http backends) with compiled ip2location and ip2proxy:
I checked 1.18.0 and 1.20.1 versions of nginx with this bugs (nginx flags same on 1.16.1, 1.18.0, 1.20.1).
OS: Debian 9 x86_64 Stretch Compiled nginx flags:
nginx version: nginx/1.18.0 built by gcc 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --add-module=/root/compile/ip2mod-location --add-module=/root/compile/ip2mod-proxy --with-cc-opt='-g -O2 -fdebug-prefix-map=/root/compile/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
I use latest libraries for building ip2location/ip2proxy with nginx:
https://github.com/chrislim2888/IP2Location-C-Library/archive/master.zip https://github.com/ip2location/ip2proxy-c/archive/master.zip https://github.com/ip2location/ip2location-nginx/archive/master.zip https://github.com/ip2location/ip2proxy-nginx/archive/master.zip
Dmesg errors:
[Mon Oct 25 05:25:21 2021] show_signal_msg: 66 callbacks suppressed [Mon Oct 25 05:25:21 2021] nginx[12799]: segfault at 55ef2ac5d359 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12831]: segfault at 55ef2ac5d37d ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12814]: segfault at 55ef2ac5d331 ip 00007f9cae118c77 sp 00007ffdcd1c37c0 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] traps: nginx[12821] trap invalid opcode ip:7f9cae118c76 sp:7ffdcd1c3800 error:0 [Mon Oct 25 05:25:21 2021] in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12838]: segfault at 55ef2ac5d3f0 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] traps: nginx[12859] trap invalid opcode ip:7f9cae118c76 sp:7ffdcd1c3800 error:0 [Mon Oct 25 05:25:21 2021] in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12864]: segfault at 55ef2ac5d3b0 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12884]: segfault at 55ef2ac5d335 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12883]: segfault at 55ef2ac5d37b ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000] [Mon Oct 25 05:25:21 2021] nginx[12905]: segfault at 55ef2ac5d3d8 ip 00007f9cae118c77 sp 00007ffdcd1c3800 error 4 in libIP2Proxy.so.2.0.0[7f9cae115000+5000]
And this is not all, also have a strange trouble when nginx workers consume 100% with system usage(in the top), and in error logs of nginx in this moment i have a tons of messages like this:
2021/10/28 22:41:06 [error] 34416#34416: *51265 upstream prematurely closed connection while reading response header from upstream.
But upstream is ok - this is very quick upstream written on Go language based on Iris and in case of nginx 1.16.1 no have any problems.
strace connected to any 100% cpu usage nginx worker shows lseek method in infinity loop to only one descriptor like this:
lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488 lseek(1129, 70376488, SEEK_SET) = 70376488
But with 1.16.1 version no have any of this problems.
But I cannot use more newer nginx with ip2location and ip2proxy library :(
I don`t know ip2location library have any problems or not with nginx 1.18+, but ip2proxy have.
May I know what configuration you use to load the database?
Nginx settings:
user www-data www-data;
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 16384;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
use epoll;
worker_connections 16384;
multi_accept on;
}
Only two directives:
http {
ip2location_database /etc/ip2location/IPV6-COUNTRY.BIN;
ip2proxy_database /etc/ip2proxy/IP2PROXY-IP-PROXYTYPE-COUNTRY.BIN;
proxy_redirect off;
proxy_max_temp_file_size 0;
proxy_connect_timeout 1;
proxy_send_timeout 30;
proxy_read_timeout 60;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
lingering_time 30;
lingering_timeout 5;
client_header_timeout 240;
client_body_timeout 240;
send_timeout 120;
keepalive_timeout 300;
reset_timedout_connection on;
server_tokens off;
charset utf-8;
client_max_body_size 1024m; #Not used, but configured large size
client_body_buffer_size 256k;
Upstream esd used for small POST JSON Data. And this is configuration for upstream(with keepalive or without not necessary, same result):
upstream esd {
server 127.0.0.1:1234;
keepalive 128;
}
location /asea {
proxy_set_header Host domain.com;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-GEOIP-COUNTRY-CODE $ip2location_country_short;
proxy_set_header X-PROXY $ip2proxy_is_proxy;
proxy_set_header X-PROXY-TYPE $ip2proxy_proxy_type;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://esd;
}
Correction. Now I use latest ip2proxy/ip2location with Nginx 1.16.1 (because i recompile from zero) and have some times infinity lseek too with 100% system cpu usage by nginx workers.
lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696 lseek(261, 3490838696, SEEK_SET) = 3490838696
Tomorrow I try to build nginx 1.16.1 with old version(around 1+ year ago) of ip2proxy/ip2location, and test again...
I successfully build nginx 1.16.1 with old version of ip2location and ip2proxy, this commits from 2020 year:
wget https://github.com/chrislim2888/IP2Location-C-Library/archive/9a99a06a99bffbc5779d64ab62a9cef839ef6a63.zip wget https://github.com/ip2location/ip2proxy-c/archive/436bd401f49a94cf706192ae5e62bc72cfef799c.zip wget https://github.com/ip2location/ip2location-nginx/archive/08ec371b7dd188c5b727b2533a39ec3988a7c1ad.zip wget https://github.com/ip2location/ip2proxy-nginx/archive/8b378603dfb0190a6c0d06c17b9ec6c77014eebc.zip
Before, I used access_type shared_memory option for ip2location and ip2proxy, and now I changed nginx config back to:
ip2location_database /etc/ip2location/IPV6-COUNTRY.BIN;
ip2location_access_type shared_memory;
ip2proxy_database /etc/ip2proxy/IP2PROXY-IP-PROXYTYPE-COUNTRY.BIN;
ip2proxy_access_type shared_memory;
Now I continue test with old versions of libraries and with using shared_memory access_type early.
Same problem, 100% CPU usage by workers with lseek. With shared_memory nginx more quickly reach this bad condition. Can newer databases affect this? Now I don`t know how to fix this. Build version from 2019, I think this is not help.
Current status:
I installed latest version of libraries with nginx 1.16.1, now this setup shows minimal problems.
Can I check if you changed this line 352 to the below and test if works?
IP2Proxy_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);
Can I check if you changed this line 352 to the below and test if works?
IP2Proxy_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);
By default I use nginx and latest version libraries with:
IP2Proxy_open_mem(gcf->handler, IP2PROXY_CACHE_MEMORY);
IP2Location_open_mem(gcf->handler, IP2LOCATION_CACHE_MEMORY);
But now I recompile nginx and latest version libraries with:
IP2Proxy_open_mem(gcf->handler, IP2PROXY_SHARED_MEMORY);
IP2Location_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);
I think better compile two libraries with single memory method.
Please wait a little, I look how it works.
Nginx 1.16.1 with:
IP2Proxy_open_mem(gcf->handler, IP2PROXY_SHARED_MEMORY);
IP2Location_open_mem(gcf->handler, IP2LOCATION_SHARED_MEMORY);
Same result with 100% cpu consumption(by system usage), dmesg not shown any abnormal segfaults with libIP2Proxy.so.2.0.0 (segfaults I look only on 1.18+ nginx). May be 100% cpu consumption is not problem with ip2proxy or ip2location library, but i can not disable ip2proxy and ip2location and check, because our service depends on this libraries, and without libraries can not work.
I look at google and can`t find any other problems with clear nginx, like current problem in my case.
I attach screenshot from top:

Now strace on any 100% cpu nginx workers shows other result, not a simple loop with single SEEK_SET:
read(263, "", 4096) = 0 lseek(263, 3728, SEEK_CUR) = 1056400 read(263, "", 4096) = 0 lseek(263, 2249146368, SEEK_SET) = 2249146368 read(263, "", 3740) = 0 lseek(263, 3740, SEEK_CUR) = 3853258072 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 4096) = 4096 lseek(263, 531161088, SEEK_SET) = 531161088 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 4096) = 4096 lseek(263, 531165184, SEEK_SET) = 531165184 lseek(263, 3967135744, SEEK_SET) = 3967135744 read(263, "", 4096) = 0 lseek(263, 1268, SEEK_CUR) = 1053940 read(263, "", 4096) = 0 lseek(263, 3967135744, SEEK_SET) = 3967135744 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 1280) = 1280 read(263, "\304\367\240\5\320\350\240\5\331\207\0\1\312\367\240\5\227\365\240\5\332\207\0\1\304\367\240\5\320\350\240\5"..., 4096) = 4096 lseek(263, 1390153728, SEEK_SET) = 1390153728 read(263, "", 4096) = 0 lseek(263, 2912, SEEK_CUR) = 1055584 read(263, "\304\367\240\5\320\350\240\5\331\207\0\1\312\367\240\5\227\365\240\5\332\207\0\1\304\367\240\5\320\350\240\5"..., 4096) = 4096 lseek(263, 1390153728, SEEK_SET) = 1390153728 read(263, "", 4096) = 0 lseek(263, 2924, SEEK_CUR) = 1055596 read(263, "", 4096) = 0 lseek(263, 2249146368, SEEK_SET) = 2249146368 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 3728) = 3728 read(263, "\4\205\0\1\312\367\240\5\227\365\240\5\5\205\0\1\304\367\240\5\320\350\240\5\t\205\0\1\312\367\240\5"..., 4096) = 4096 lseek(263, 2249154192, SEEK_SET) = 2249154192 lseek(263, 531161088, SEEK_SET) = 531161088 read(263, "", 4096) = 0 lseek(263, 2088, SEEK_CUR) = 1054760 read(263, "\304\367\240\5\320\350\240\5\331\207\0\1\312\367\240\5\227\365\240\5\332\207\0\1\304\367\240\5\320\350\240\5"..., 4096) = 4096 lseek(263, 531161088, SEEK_SET) = 531161088
I think, and i try to build a test environment with different nginx versions + libraries ip2proxy/ip2location, backend daemon(in testing environment simply save data to /dev/null) and with a client coroutine simulation of real requests with generated POST data. Then we can test nginx without ip2proxy/ip2location libraries too.
I want to reproduce this problems(segfaults and 100% system cpu usage by nginx) in test environment, then You can download this ready environment and test with debug byself, because I can`t find and fix that bugs byself. Please wait around 1-2 weeks.
Now I simply restarts nginx on my production servers when cpu system usage > 50%.
Several days ago I built a new nginx from packages.sury.org repository - Nginx 1.20.1 and latest ip2proxy/ip2location (CACHE_MEMORY default edition) libraries:
With this nginx I dont have any segfaults. But trouble with periodically 100% cpu system usage with lseek loop on single descriptor exists. I think that problem I had earlier, just didn't pay attention before, and we had less traffic, and our monitoring automatically restarts nginx if cant connect, without any notifications.
I also disabled:
proxy_buffering off; proxy_request_buffering off;
and enabled:
proxy_ignore_client_abort on;
But result same, periodically 100% system cpu usage.
Ok, I continue make a test environment and try to repeat synthetically that problem.
This is current build without any segfaults:
nginx -V nginx version: nginx/1.20.1 built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/root/compile/nginx-1.20.1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_flv_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_perl_module=dynamic --with-http_random_index_module --with-http_secure_link_module --with-http_sub_module --with-http_xslt_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-stream=dynamic --with-stream_geoip_module=dynamic --with-stream_ssl_module --with-stream_ssl_preread_module --add-module=/root/compile/ip2mod-location --add-module=/root/compile/ip2mod-proxy --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/brotli --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-headers-more-filter --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-auth-pam --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-cache-purge --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-dav-ext --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-ndk --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-echo --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-fancyindex --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-geoip2 --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/nchan --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-lua --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/rtmp --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-uploadprogress --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-upstream-fair --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/http-subs-filter --add-dynamic-module=/root/compile/nginx-1.20.1/debian/modules/ssl-ct
Can you please try recompiling with the latest IP2Location-C library without IP2Proxy?
Can you please try recompiling with the latest IP2Location-C library without IP2Proxy?
Yes, I try little later
Hello. I found what was the problem. It's funny, but I had a slightly incorrectly written script that periodically downloaded and updated the ip2location and ip2proxy databases. In other words, sometimes the file with the database was accidentally cut off by several kilobytes or even by 1-2 megabytes. As a result, Nginx starts up with broken database files without problems. But then the problems that I described above may begin.
Ideally, Nginx should simply not start up with broken IP database files. It would probably be nice to have such a check at startup.
Please close this issue.
After long time:
-
Problem with segfaults it is not a problem with IP2 libraries. This is i see only on 1 server, and after recompile and reinstall i don`t be seen this problem.
-
Problem with 100% system CPU usage on some workers - this is problem i see only with IP2 libraries compiled with Nginx. I tried old versions, shared memory etc.
Shared and cache memory have difference only in strace messages, but strace shows unlimited lseek in all cases.
This is same issue here: https://github.com/ip2location/ip2location-nginx/issues/13 But in my case i have very loaded nginx and see falling workers to 100% sys cpu usage by random nginx workers not only after nginx reload, but during work too.
Kindly try it again with the latest version.