unkillable connections to ftp port 21
What I Did
I have been using proftpd 1.2.x in a production environment for several years and recently upgraded to the latest stable 1.3.7c and run proftpd inside a Docker container. Since the upgrade, security scanners like Qualys and random (malicious?) users create hanging connections stuck on status "Authenticating" that are unkillable with ftpdctl and that ignore /etc/proftpd.conf settings "TimeoutNoTransfer", "TimeoutStalled", "TimeoutIdle". I am able to kill the connections by using the connection PID from ftptop and executing kill -9 <PID>. These hanging connections are causing CPU 100%.
Here's what I see in the docker logs for proftpd for one such malicious connection that was made on Jan 13 and remained in COMMAND (authenticating) for 270h!:
{"log":"2022-01-13 23:35:22,627 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): mod_delay/0.7: no DelayOnEvent rules configured with \"DelayTable none\" in effect, disabling module\n","stream":"stderr","time":"2022-01-13T23:35:22.627867719Z"}
{"log":"2022-01-13 23:35:22,628 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): FTP session opened.\n","stream":"stderr","time":"2022-01-13T23:35:22.628442412Z"}
{"log":"2022-01-13 23:35:22,706 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): USER admin: no such user found from 193.187.91.226 [193.187.91.226] to 172.31.0.2:21\n","stream":"stderr","time":"2022-01-13T23:35:22.707114467Z"}
{"log":"2022-01-13 23:35:22,786 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): USER admin: no such user found from 193.187.91.226 [193.187.91.226] to 172.31.0.2:21\n","stream":"stderr","time":"2022-01-13T23:35:22.786513963Z"}
The attacker tries to login with an invalid user admin but this connection is never terminated until I discover it 1.5 weeks after Jan 13 and manually kill the PID.
What I Expected/Wanted
I expect that connections should be dropped after failing user/passwd auth and that the proftpd.conf settings for "TimeoutNoTransfer", "TimeoutStalled", "TimeoutIdle" should apply to all connections. I also expected that ftpdctl kick <IP> should work on stalled connections. Note that ftpdctl kick works fine on non-zombie/malicious ftp sessions.
ProFTPD Version and Configuration
Please help us reproduce the problem/issue you are encountering. To do this, we need to know which version of ProFTPD you are using, how it was built, etc. The following command is an easy way to get all of this information:
$ proftpd -V
Compile-time Settings:
Version: 1.3.7c (maint)
Platform: LINUX [Linux 5.4.0-1057-gcp x86_64]
Built: Sun Sep 5 2021 00:00:00 UTC
Built With:
configure '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-depend
ency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--da
tadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var'
'--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--libexecdir=/usr/libexec/proftpd' '
--localstatedir=/run/proftpd' '--disable-strip' '--enable-ctrls' '--enable-dso' '--enable-facl' '--enable-ipv6' '--enab
le-memcache' '--enable-nls' '--enable-openssl' '--disable-pcre' '--disable-redis' '--enable-shadow' '--enable-tests=non
etwork' '--with-libraries=/usr/lib64/mariadb' '--with-includes=/usr/include/mysql' '--with-modules=mod_readme:mod_auth_
pam:mod_tls' '--with-shared=mod_sql:mod_sql_passwd:mod_sql_mysql:mod_sql_postgres:mod_sql_sqlite:mod_quotatab:mod_quota
tab_file:mod_quotatab_ldap:mod_quotatab_radius:mod_quotatab_sql:mod_ldap:mod_ban:mod_ctrls_admin:mod_facl:mod_load:mod_
vroot:mod_radius:mod_ratio:mod_rewrite:mod_site_misc:mod_exec:mod_shaper:mod_wrap2:mod_wrap2_file:mod_wrap2_sql:mod_cop
y:mod_deflate:mod_ifversion:mod_qos:mod_sftp:mod_sftp_pam:mod_sftp_sql:mod_tls_shmcache:mod_tls_memcache:mod_unique_id:
mod_ifsession' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CC=gcc' 'CFLAGS=-O2 -flto=au
to -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
-Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/r
edhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' '
LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redh
at/redhat-annobin-cc1 ' 'CXX=g++' 'CXXFLAGS=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pip
e -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-har
dened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-u
nwind-tables -fstack-clash-protection -fcf-protection'
CFLAGS: -g2 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-securit
y -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-st
rong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-pro
tection -fcf-protection -Wall -fno-omit-frame-pointer -fno-strict-aliasing -Werror=implicit-function-declaration
LDFLAGS: -L$(top_srcdir)/lib -L$(top_builddir)/lib -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redha
t/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -rdynamic -L/usr/lib64/mariadb -L/usr/lib64/ -L/usr
/lib64
LIBS: -lacl -lssl -lcrypto -lcap -lssl -lcrypto -lpam -lsupp -lattr -lresolv -lresolv -lcrypt -lmemcachedutil -lme
mcached
Files:
Configuration File:
/etc/proftpd.conf
Pid File:
/run/proftpd/proftpd.pid
Scoreboard File:
/run/proftpd/proftpd.scoreboard
Header Directory:
/usr/include/proftpd
Shared Module Directory:
/usr/libexec/proftpd
Info:
+ Max supported UID: 4294967295
+ Max supported GID: 4294967295
Features:
- Autoshadow support
+ Controls support
+ curses support
- Developer support
+ DSO support
+ IPv6 support
+ Largefile support
- Lastlog support
+ Memcache support
+ ncursesw support
+ NLS support
+ OpenSSL support (OpenSSL 1.1.1k FIPS 25 Mar 2021, FIPS enabled)
- PCRE support
+ POSIX ACL support
- Redis support
+ Sendfile support
+ Shadow file support
- Sodium support
+ Trace support
+ xattr support
Tunable Options:
PR_TUNABLE_BUFFER_SIZE = 1024
PR_TUNABLE_DEFAULT_RCVBUFSZ = 8192
PR_TUNABLE_DEFAULT_SNDBUFSZ = 8192
PR_TUNABLE_ENV_MAX = 2048
PR_TUNABLE_GLOBBING_MAX_MATCHES = 100000
PR_TUNABLE_GLOBBING_MAX_RECURSION = 8
PR_TUNABLE_HASH_TABLE_SIZE = 40
PR_TUNABLE_LOGIN_MAX = 256
PR_TUNABLE_NEW_POOL_SIZE = 512
PR_TUNABLE_PATH_MAX = 4096
PR_TUNABLE_SCOREBOARD_BUFFER_SIZE = 80
PR_TUNABLE_SCOREBOARD_SCRUB_TIMER = 30
PR_TUNABLE_SELECT_TIMEOUT = 30
PR_TUNABLE_TIMEOUTIDENT = 10
PR_TUNABLE_TIMEOUTIDLE = 600
PR_TUNABLE_TIMEOUTLINGER = 10
PR_TUNABLE_TIMEOUTLOGIN = 300
PR_TUNABLE_TIMEOUTNOXFER = 300
PR_TUNABLE_TIMEOUTSTALLED = 3600
PR_TUNABLE_XFER_SCOREBOARD_UPDATES = 10
I am using the proftpd v1.3.7c package provided by Fedora 35.
In addition, we need to see all of the ProFTPD configuration files you are using (minus any sensitive information like passwords, of course). Armed with the version and configuration data, then, we can set up ProFTPD locally using the same configuration, and see what happens.
proftpd conf files:
ftptop screen shots of zombie connections


Thanks for the detailed bug report! For these scanner connections, can you tell if they are only sending the USER command (for e.g. "admin"), or do they appear to hang after sending a PASS command? There's a lot more machinery involved in the processing of a PASS command; if these "unkillable connections" happen only with USER commands, that will narrow down the code paths to examine...
One way to tell about the above question is to use ps auxwww (especially as root); ProFTPD updates the process title for each session with its current/recent command. This means that ps auxwww will show a little more information than ftptop; I'm hoping that it can show us the last command/activity for these hanging connections.
One way to tell about the above question is to use ps auxwww (especially as root)
Sure -- I need to add ps to the stripped-down fedora35 container environment running proftpd and then next time I see a hanging connection I will run ps auxwww and share my findings with you. btw I am currently on holiday for the Lunar New Year in Asia so it will be several days before I can take a look at this. Thanks for your help.
Sorry for the delay in getting back to you! I created a new proftpd container image that includes ps and when I got another hanging connection I ran ftptop and ps auxwww. Here are the screenshots (although I blurred out the user name and IP from ps).


Note that this time the hanging connection came from a normal user I gave access to, not a scanner like Qualys.
The only difference between the ftptop and ps output is that the latter also shows the username of the connection stuck in "authenticating".
Thanks
Thanks! For that normal user session, did the client eventually send the PASS command (to complete the authentication), or did it just send USER, and then do nothing?
For the normal user session w/ PID 32148 the connection got stuck after USER was sent, but I think this is an isolated incident b/c user al... has been using proftpd without problems.
OK. I'm poring over the code base, to see what else might be happening here. In the mean time, the next time you see one of these stuck/unkillable processes, can you try doing a strace -f -p $PID on the process ID? That might provide some more information about what system calls it is doing, which can then help me narrow down where in the code those system calls are happening. Thanks!
We are also seeing this issue, though only from malicious connections so far.
An strace -f -p of the PID shows the same thing looping (two entries here):
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
alarm(5) = 0
rt_sigreturn({mask=[]}) = 35984944
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
alarm(5) = 0
rt_sigreturn({mask=[]}) = 35952464
Our config:
# /opt/proftpd/sbin/proftpd -V
Compile-time Settings:
Version: 1.3.7d (git)
Platform: LINUX [Linux 3.10.0-1160.53.1.el7.x86_64 x86_64]
Built: Fri Feb 25 2022 20:09:42 GMT
Built With:
configure '--prefix=/opt/proftpd-1.3.7d-56d500e' '--enable-ctrls' '--with-modules=mod_sftp:mod_tls:mod_ifsession:mod_ban:mod_auth_pam:mod_vroot'
CFLAGS: -g2 -O2 -Wall -fno-omit-frame-pointer -fno-strict-aliasing
LDFLAGS: -L$(top_srcdir)/lib -L$(top_builddir)/lib -rdynamic
LIBS: -lssl -lcrypto -lpam -lssl -lcrypto -lsupp -lcrypt -ldl
Files:
Configuration File:
/opt/proftpd-1.3.7d-56d500e/etc/proftpd.conf
Pid File:
/opt/proftpd-1.3.7d-56d500e/var/proftpd.pid
Scoreboard File:
/opt/proftpd-1.3.7d-56d500e/var/proftpd.scoreboard
Info:
+ Max supported UID: 4294967295
+ Max supported GID: 4294967295
Features:
- Autoshadow support
+ Controls support
- curses support
- Developer support
- DSO support
+ IPv6 support
+ Largefile support
- Lastlog support
- Memcache support
- ncurses support
- NLS support
+ OpenSSL support (OpenSSL 1.0.2k 26 Jan 2017, FIPS enabled)
- PCRE support
- POSIX ACL support
- Redis support
+ Sendfile support
+ Shadow file support
- Sodium support
+ Trace support
+ xattr support
Tunable Options:
PR_TUNABLE_BUFFER_SIZE = 1024
PR_TUNABLE_DEFAULT_RCVBUFSZ = 8192
PR_TUNABLE_DEFAULT_SNDBUFSZ = 8192
PR_TUNABLE_ENV_MAX = 2048
PR_TUNABLE_GLOBBING_MAX_MATCHES = 100000
PR_TUNABLE_GLOBBING_MAX_RECURSION = 8
PR_TUNABLE_HASH_TABLE_SIZE = 40
PR_TUNABLE_LOGIN_MAX = 256
PR_TUNABLE_NEW_POOL_SIZE = 512
PR_TUNABLE_PATH_MAX = 4096
PR_TUNABLE_SCOREBOARD_BUFFER_SIZE = 80
PR_TUNABLE_SCOREBOARD_SCRUB_TIMER = 30
PR_TUNABLE_SELECT_TIMEOUT = 30
PR_TUNABLE_TIMEOUTIDENT = 10
PR_TUNABLE_TIMEOUTIDLE = 600
PR_TUNABLE_TIMEOUTLINGER = 10
PR_TUNABLE_TIMEOUTLOGIN = 300
PR_TUNABLE_TIMEOUTNOXFER = 300
PR_TUNABLE_TIMEOUTSTALLED = 3600
PR_TUNABLE_XFER_SCOREBOARD_UPDATES = 10
The OS is Centos 7.9, kernel 3.10.0-1160.53.1.el7.x86_64.
@jpfinger Thanks for that info, that helps! The next time you can, can you run strace -f -p -t -T? The additional -t -T options will add the wall clock timestamps, and time spent in syscalls, to the strace output. For the above SIGALRM loop you provided, the timing can help us narrow down which timers are causing the SIGARLM signals, which in turn can narrow down the code path in question.
Unfortunately I had to kill the existing stuck sessions we had and we haven't seen any more yet. This is still on my radar and will get an update when we see it again.
Ok, we have one currently stuck at ~20 hours. Someone knows we're talking about them! I'll hold off on killing this one for a bit.
# strace -f -t -T -p 28387
strace: Process 28387 attached
19:13:36 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:36 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000019>
19:13:36 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000022>
19:13:36 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000016>
19:13:36 alarm(5) = 0 <0.000031>
19:13:36 rt_sigreturn({mask=[]}) = 40489792 <0.000017>
19:13:41 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:41 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000026>
19:13:41 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000021>
19:13:41 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000021>
19:13:41 alarm(5) = 0 <0.000022>
19:13:41 rt_sigreturn({mask=[]}) = 36750912 <0.000022>
19:13:46 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:46 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000025>
19:13:46 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000027>
19:13:46 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000020>
19:13:46 alarm(5) = 0 <0.000021>
19:13:46 rt_sigreturn({mask=[]}) = 35991104 <0.000021>
19:13:51 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:51 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000025>
19:13:51 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000022>
19:13:51 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000021>
19:13:51 alarm(5) = 0 <0.000021>
19:13:51 rt_sigreturn({mask=[]}) = 38513872 <0.000021>
Thanks! From that trace output, we do see that the SIGARLM signal is being sent to the process every 5 secs. None of the hardcoded default timeouts are 5s; @jpfinger do you have any 5s timeouts in your ProFTPD configuration anywhere (including contrib modules)?
In fact we do in mod_ctrls:
...
<IfModule mod_ctrls.c>
ControlsEngine on
ControlsACLs all allow user root
ControlsMaxClients 2
ControlsLog /some/path/logs/controls.log
ControlsInterval 5
ControlsSocket /some/path/ctrls.sock
ControlsSocketOwner root root
ControlsSocketACL allow user root
<IfModule mod_ctrls_admin.c>
AdminControlsACLs all allow user root
</IfModule>
</IfModule>
...
OK, great. I'll work through the mod_ctrls code, with an eye toward newly forming connections/sessions, see if anything turns up.
In the mean time, for the next time you see such unkillable processes, in addition to the strace -f -p -t -T command, I'm thinking it might (or might not) be useful to configure trace logging for timers, e.g.:
TraceLog /path/to/proftpd/trace.log
Trace timer:30
and then search the generated TraceLog for the PID of the unkillable process. That should confirm that it is indeed the ControlsInterval timer which keeps firing. In theory, it should not be that particular timer, since mod_ctrls is supposed to remove that timer on TCP connect (see here); that timer is meant only for the listening daemon process, not the forked session process.
Hmm; it turns out that there is another way that we might see these 5s SIGALRM signals, without any explicitly configured timeouts matching that 5s: see https://github.com/proftpd/proftpd/blob/master/src/timers.c#L191.
The Timer API is not entirely accurate, and in this case, it will default to checking any registered timer intervals/callbacks every 5s anyway. So that's possibly what's happening with this unkillable process.
Unfortunately, that doesn't provide any clues yet about how the process got into this state. But the fact that it's "unkillable" does suggest that timers/alarms are blocked for that process somehow -- I'll be adding trace logging, for the timer trace channel, for when alarms are being blocked/unblocked, to aid in this diagnosis.
@jpfinger for your stuck PID 28387, in the strace output, is there anything other than the SIGALRM? That is, in addition to this 5s SIGLARM signal being delivered, I would also expect to see some other system calls, such as read(2) on a socket or somesuch...
@Castaglia I ran the strace against it for 30 minutes this morning and all I got were the quoted SIGALRM lines.
Here are the timer trace log lines for one of these connections:
2022-04-30 00:39:23,379 [7139] <timer:7>: added timer ID 1 ('TimeoutLogin', for module 'auth'), triggering in 300 seconds
2022-04-30 00:39:23,380 [7139] <timer:7>: added timer ID 2 ('TimeoutIdle', for module 'core'), triggering in 600 seconds
2022-04-30 00:39:28,580 [7139] <timer:7>: reset timer ID 2 ('TimeoutIdle', for module 'core')
2022-04-30 00:39:28,696 [7139] <timer:7>: reset timer ID 2 ('TimeoutIdle', for module 'core')
2022-04-30 00:39:33,897 [7139] <timer:7>: reset timer ID 2 ('TimeoutIdle', for module 'core')
2022-04-30 00:39:34,013 [7139] <timer:7>: reset timer ID 2 ('TimeoutIdle', for module 'core')
Hi,
we are seeing the same issue since quite some time. I'm not sure we are running the same configuration, however today I actually attached the debugger.
(gdb) bt
#0 0x0000557bb1cb879c in free_blocks (pool_tag=<optimized out>, blok=0x557bb1fadf00) at pool.c:173
#1 clear_pool (p=0x557bb21ee8a0) at pool.c:580
#2 destroy_pool (p=0x557bb21ee8a0) at pool.c:611
#3 0x0000557bb1cb8f91 in destroy_pool (p=<optimized out>) at pool.c:591
#4 0x0000557bb1cecd69 in pr_fsio_close (fh=0x557bb21ee8e8) at fsio.c:5132
#5 0x0000557bb1d1dd14 in af_endgrent () at mod_auth_file.c:593
#6 authfile_endgrent (cmd=<optimized out>) at mod_auth_file.c:1238
#7 0x0000557bb1ce062e in pr_module_call (m=0x557bb1e25ec0 <auth_file_module>, func=0x557bb1d1dcf6 <authfile_endgrent>, cmd=cmd@entry=0x557bb21e2dc0)
at modules.c:59
#8 0x0000557bb1ce33db in dispatch_auth (cmd=cmd@entry=0x557bb21e2dc0, match=match@entry=0x557bb1dc42ea "endgrent", m=m@entry=0x0) at auth.c:425
#9 0x0000557bb1ce3c16 in pr_auth_endgrent (p=<optimized out>) at auth.c:523
#10 0x0000557bb1d20ea0 in auth_pre_pass (cmd=0x557bb201bdf8) at mod_auth.c:2495
#11 0x0000557bb1ce062e in pr_module_call (m=0x557bb1e26500 <auth_module>, func=0x557bb1d20e6b <auth_pre_pass>, cmd=cmd@entry=0x557bb201bdf8) at modules.c:59
#12 0x0000557bb1cb3e83 in _dispatch (cmd=cmd@entry=0x557bb201bdf8, cmd_type=cmd_type@entry=1, validate=validate@entry=0, match=0x557bb201be90 "PASS",
match@entry=0x0) at main.c:366
#13 0x0000557bb1cb6b17 in pr_cmd_dispatch_phase (cmd=0x557bb201bdf8, phase=phase@entry=0, flags=flags@entry=3) at main.c:678
#14 0x0000557bb1cb6d3e in pr_cmd_dispatch (cmd=<optimized out>) at main.c:801
#15 cmd_loop (server=<optimized out>, c=<optimized out>) at main.c:944
#16 0x0000557bb1cb4e21 in fork_server (fd=<optimized out>, l=<optimized out>, no_fork=<optimized out>) at main.c:1517
#17 0x0000557bb1cb5bb6 in daemon_loop () at main.c:1766
#18 0x0000557bb1cb351e in standalone_main () at main.c:1967
#19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at main.c:2792
Looking into free_blocks it seems there is a reference to itself (fn_s_s_p_blok_hnext does s + s + p blok->h.next):
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$19 = (union block_hdr *) 0x5573c6883e20
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$20 = (union block_hdr *) 0x5573c6882c50
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$21 = (union block_hdr *) 0x5573c68843c0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$22 = (union block_hdr *) 0x5573c6882b60
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$23 = (union block_hdr *) 0x5573c6884a50
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$24 = (union block_hdr *) 0x5573c6882a70
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$25 = (union block_hdr *) 0x5573c6883d30
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$26 = (union block_hdr *) 0x5573c6882980
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$27 = (union block_hdr *) 0x5573c68841e0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$28 = (union block_hdr *) 0x5573c69458c0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$29 = (union block_hdr *) 0x5573c6945af0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$30 = (union block_hdr *) 0x5573c694f3f0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$31 = (union block_hdr *) 0x5573c6950630
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$32 = (union block_hdr *) 0x5573c694f1c0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$33 = (union block_hdr *) 0x5573c68831f0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$34 = (union block_hdr *) 0x5573c68845a0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$35 = (union block_hdr *) 0x5573c6883010
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$36 = (union block_hdr *) 0x5573c6884000
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$37 = (union block_hdr *) 0x5573c6883100
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$38 = (union block_hdr *) 0x5573c6884690
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$39 = (union block_hdr *) 0x5573c6882f20
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$40 = (union block_hdr *) 0x5573c6883f10
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$41 = (union block_hdr *) 0x5573c6882e30
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$42 = (union block_hdr *) 0x5573c68844b0
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$43 = (union block_hdr *) 0x5573c6882d40
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
$44 = (union block_hdr *) 0x5573c6883e20
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
# !! at this point we are starting again !!
$45 = (union block_hdr *) 0x5573c6882c50
(gdb) fn_s_s_p_blok_hnext
175 in pool.c
173 in pool.c
[...]
One more thing to add. The process table entry always is: <pid> ftp .... proftpd: a.b.c.d:12345: USER xyz. xyz is a valid user, however no authentication has happened as the process is still running with the unprivileged user. Running ProFTPD 1.3.8rc3
@manuelm Thanks, that is very helpful! I'll focus some attention in that area, see if I can figure out how that particular clear_pool() behavior can happen.
@manuelm would you be able to try out https://github.com/proftpd/proftpd/commit/16870859b39b891b68e3708d76fa843747051409 in your environment, see if it helps clear up this issue in your environment?
@manuelm would you be able to try out 1687085 in your environment, see if it helps clear up this issue in your environment?
Yeah sure. Rolling out the patch as we speak. It usually takes a few days, maybe even 1-2 weeks for the bug to happen.
Due to kernel updates and subsequent reboots it took longer to verify than expected. However we are seeing this issue again.
backtrace is the same. Loop in pool.c:173
@manuelm Got it. Thanks for the update. I should've looked closer at the stack trace you provided; the loop in the linked list is quite large, and not as simple as assumed by my PR.
To help unwind this linked list loop, would it be possible to print out the pool_tag variable in gdb, for each of those layers? I know it might be tedious (I wish I could reproduce this locally!), but knowing those tags of the pools would help identify where those pools come from in the source code -- which, in turn, would help me understand how this loop is created in the first place.
To help unwind this linked list loop, would it be possible to print out the
pool_tagvariable ingdb, for each of those layers?
(gdb) bt
#0 0x0000562a6b1f773f in free_blocks (pool_tag=<optimized out>, blok=0x562a6cd6e530) at pool.c:175
#1 clear_pool (p=0x562a6cd73cb0) at pool.c:589
#2 destroy_pool (p=0x562a6cd73cb0) at pool.c:620
#3 0x0000562a6b1f7f2e in destroy_pool (p=<optimized out>) at pool.c:600
[...]
Verify loop:
(gdb) p ((pool *)0x562a6cd73cb0)->first
(union block_hdr *) 0x562a6cd73c90
(gdb) p ((pool *)0x562a6cd73cb0)->first->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next->h.next
(union block_hdr *) 0x562a6cd73c90
all ancestors:
(gdb) p ((pool *)0x562a6cd73cb0)->tag
0x0
(gdb) p ((pool *)0x562a6cd73cb0)->parent->tag
0x562a6b30439f "FS Pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->parent->tag
0x562a6b2ff430 "<VirtualHost> Pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->parent->parent->tag
0x562a6b2fb645 "permanent_pool"
Looks like the tag of the pool with the loop has been optimized out. I'll re-compile + rollout with less optimization.
parents sub pools:
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->tag
0x562a6b2fc48d "config_rec pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->tag
0x562a6b2fc48d "config_rec pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->sub_next->tag
0x562a6b2fc48d "config_rec pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->sub_next->sub_next->tag
0x562a6b2fc48d "config_rec pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->sub_next->sub_next->sub_next->tag
0x562a6b2fc48d "config_rec pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->sub_next->sub_next->sub_next->sub_next->tag
0x562a6b2fc48d "config_rec pool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->sub_next->sub_next->sub_next->sub_next->sub_next->tag
0x562a6b2fc4af "pr_config_add() subpool"
(gdb) p ((pool *)0x562a6cd73cb0)->parent->sub_next->sub_next->sub_next->sub_next->sub_next->sub_next->sub_next->sub_next->tag
0x562a6b2fbdd1 "table pool"
@manuelm Would it be possible to see your ProFTPD configuration, and proftpd -V output, so that I can see which modules (and which vhost configuration) are involved with your stack trace? Thanks!
Sure. Only users using the FTP vhost are looping. SFTP is working fine. All foreign modules should be available at https://manuel.mausz.at/coding/proftpd/
Compile-time Settings:
Version: 1.3.8rc3 (devel)
Platform: LINUX [Linux 5.15.74-gentoo-ud59 x86_64]
OS/Release:
NAME=Gentoo
ID=gentoo
PRETTY_NAME="Gentoo Linux"
VERSION_ID="2.8"
Built: Thu Oct 27 2022 10:45:45 CEST
Built With:
configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--docdir=/usr/share/doc/proftpd-1.3.8_rc3' '--htmldir=/usr/share/doc/proftpd-1.3.8_rc3/html' '--libdir=/usr/lib64' '--localstatedir=/run/proftpd' '--sysconfdir=/etc/proftpd' '--disable-strip' '--enable-facl' '--enable-auth-file' '--enable-cap' '--disable-ctrls' '--disable-dso' '--disable-ident' '--enable-ipv6' '--disable-memcache' '--disable-ncurses' '--enable-nls' '--enable-openssl' '--disable-auth-pam' '--enable-pcre' '--enable-sodium' '--disable-tests' '--enable-trace' '--enable-shadow' '--enable-autoshadow' '--with-includes=/usr/include' '--with-modules=mod_facl:mod_ifsession:mod_sftp:mod_tls:mod_tls_shmcache:mod_wrap2:mod_wrap2_file:mod_vroot:mod_useralias:mod_fw_ipset:mod_log_login:mod_dovecot_anvil:mod_xclient' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CC=x86_64-pc-linux-gnu-gcc' 'CFLAGS=-march=nocona -O2 -pipe -g' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed' 'CXXFLAGS=-march=nocona -O2 -pipe -g'
CFLAGS: -g2 -march=nocona -O2 -pipe -g -Wall -fno-omit-frame-pointer -fno-strict-aliasing
LDFLAGS: -Wl,-L$(top_srcdir)/lib,-L$(top_builddir)/lib -Wl,-O1 -Wl,--as-needed -rdynamic
LIBS: -lpcreposix -lpcre -lssl -lcrypto -lsodium -lipset -lcap -ldb-5.3 -lssl -lcrypto -lattr -lcrypt -pthread
Files:
Configuration File:
/etc/proftpd/proftpd.conf
Pid File:
/run/proftpd/proftpd.pid
Scoreboard File:
/run/proftpd/proftpd.scoreboard
Info:
+ Max supported UID: 4294967295
+ Max supported GID: 4294967295
Features:
+ Autoshadow support
- Controls support
+ curses support
- Developer support
- DSO support
+ IPv6 support
+ Largefile support
- Lastlog support
- Memcache support
- ncurses support
+ NLS support
+ OpenSSL support (OpenSSL 1.1.1q 5 Jul 2022)
+ PCRE support
- PCRE2 support
+ POSIX ACL support
- Redis support
+ Sendfile support
+ Shadow file support
+ Sodium support
+ Trace support
+ xattr support
Tunable Options:
PR_TUNABLE_BUFFER_SIZE = 1024
PR_TUNABLE_DEFAULT_RCVBUFSZ = 8192
PR_TUNABLE_DEFAULT_SNDBUFSZ = 8192
PR_TUNABLE_ENV_MAX = 2048
PR_TUNABLE_GLOBBING_MAX_MATCHES = 100000
PR_TUNABLE_GLOBBING_MAX_RECURSION = 8
PR_TUNABLE_HASH_TABLE_SIZE = 40
PR_TUNABLE_LOGIN_MAX = 256
PR_TUNABLE_NEW_POOL_SIZE = 512
PR_TUNABLE_PATH_MAX = 4096
PR_TUNABLE_SCOREBOARD_BUFFER_SIZE = 80
PR_TUNABLE_SCOREBOARD_SCRUB_TIMER = 30
PR_TUNABLE_SELECT_TIMEOUT = 30
PR_TUNABLE_TIMEOUTIDENT = 10
PR_TUNABLE_TIMEOUTIDLE = 600
PR_TUNABLE_TIMEOUTLINGER = 10
PR_TUNABLE_TIMEOUTLOGIN = 300
PR_TUNABLE_TIMEOUTNOXFER = 300
PR_TUNABLE_TIMEOUTSTALLED = 3600
PR_TUNABLE_XFER_SCOREBOARD_UPDATES = 10
ServerName "FTP Service"
ServerType standalone
ServerIdent on "FTP Server ready."
UseFtpUsers off
UseReverseDNS off
DefaultServer off
SocketBindTight on
ScoreboardScrub off
User ftp
Group ftp
MaxInstances 30
# see https://github.com/proftpd/proftpd/issues/1528
FSCachePolicy off
<Class xclient>
From 194.117.254._
</Class>
<Global>
DefaultRoot ~
Umask 022
VRootEngine on
MaxClientsPerUser 6
MaxClientsPerHost 6
MaxConnectionsPerHost 6
TimeoutLogin 30
TimeoutIdle 180
TimeoutStalled 180
TimeoutSession 1800
DelayEngine off
TimesGMT off
#IdentLookups off
WtmpLog off
TransferLog /var/log/xferlog
AllowOverwrite on
AllowStoreRestart on
# Groups that are not allowed to login
<Limit LOGIN>
AllowGroup ftponly
DenyAll
</Limit>
RequireValidShell off
# drop some default capabilities
RootRevoke on
CapabilitiesSet -CAP_CHOWN -CAP_SETUID
# custom modules
UserAliasDB /etc/proftpd/useralias
UserAliasToLower on
DovecotAnvil on
<IfClass xclient>
TLSRequired off
XClientEngine on
AllowForeignAddress on
</IfClass>
</Global>
# disable the main server
Port 0
<VirtualHost ::>
Port 21
PassivePorts 49200 49399
ListOptions "-a"
AuthUserFile /etc/proftpd/auth_passwd
AuthGroupFile /etc/proftpd/auth_group
# disable PAM as we have virtual users
<IfModule mod_auth_pam.c>
AuthPAM off
</IfModule>
AuthOrder mod_auth_file.c
TLSEngine on
TLSRequired on
TLSProtocol ALL -TLSv1 -TLSv1.1
TLSRSACertificateFile /etc/proftpd/certs/server.pem
TLSECCertificateFile /etc/proftpd/certs/server2.pem
TLSDHParamFile /etc/proftpd/dhparams.pem
TLSCipherSuite EECDH+AESGCM:EDH+AESGCM:EECDH+CHACHA20:EDH+CHACHA20:!AESCCM:!DSS
TLSServerCipherPreference on
# some ftp clients have problems with reusing the session
TLSOptions NoSessionReuseRequired
TLSRenegotiate none
# TLS cipher logging
TLSOptions NoSessionReuseRequired StdEnvVars
# custom modules
FwIPsetName4 ftpin4
FwIPsetName6 ftpin6
</VirtualHost>
<VirtualHost ::>
Port 33
AuthUserFile /etc/proftpd/auth_passwd
AuthGroupFile /etc/proftpd/auth_group
# disable PAM as we have virtual users
<IfModule mod_sftp_pam.c>
SFTPPAMEngine off
</IfModule>
AuthOrder mod_auth_file.c
SFTPEngine on
SFTPHostKey NoDSA
SFTPHostKey /etc/ssh/ssh_host_rsa_key
SFTPHostKey /etc/ssh/ssh_host_ecdsa_key
SFTPHostKey /etc/ssh/ssh_host_ed25519_key
SFTPKeyBlacklist /etc/proftpd/blacklist.dat
# see https://github.com/proftpd/proftpd/commit/b307fba5957b3846b19b2dc71e663941686d6902
#SFTPHostKeys ssh-ed25519 ecdsa-sha2-nistp256 rsa-sha2-256 rsa-sha2-512
SFTPCiphers [email protected] [email protected] \
aes256-ctr aes192-ctr aes128-ctr
SFTPKeyExchanges \
[email protected] \
ecdh-sha2-nistp521 ecdh-sha2-nistp384 ecdh-sha2-nistp256 \
diffie-hellman-group18-sha512 diffie-hellman-group16-sha512 \
diffie-hellman-group14-sha256 diffie-hellman-group-exchange-sha256
SFTPDigests [email protected] [email protected] [email protected] \
[email protected] hmac-sha2-256 hmac-sha2-512
SFTPAuthorizedUserKeys file:/etc/proftpd/authkeys/%u
# Allow the same number of authentication attempts as OpenSSH.
MaxLoginAttempts 6
</VirtualHost>
@manuelm Thanks! I hadn't realized there were non-standard modules involved in your build; good to know. Hopefully they aren't causing this behavior -- I'll take a look at them.
And, separately, as a mitigation measure, I've been thinking of ways to properly handle signal interruption in the Pool API internals, where normally signals are blocked (and handled later). That would be a way of interrupting this infinite loop, and protecting against future occurrences...but it would also be nice to understand and address the cause, too.
Looking at the details provided by @gojun077 and @jpfinger and @manuelm, I'm noticing that the mod_vroot module (which does use the FSIO API) is a common factor (as is mod_tls); let's see if I can make something of that. @manuelm which version of mod_vroot are you using?
@manuelm Thanks! I hadn't realized there were non-standard modules involved in your build; good to know. Hopefully they aren't causing this behavior -- I'll take a look at them.
Hopefully :-) - I've disabled my mod_useralias module. It isn't used anyway.
@manuelm which version of
mod_vrootare you using?
v0.9.9. However there's a new v0.9.10 tag. I can rebuild with that.