OpenDKIM icon indicating copy to clipboard operation
OpenDKIM copied to clipboard

OpenDKIM Filter: accept() returned invalid socket (Suspect Hanging UDP DNS Connections, possibly reaching max open files)

Open conathan opened this issue 3 years ago • 5 comments
trafficstars

Noticed via monitoring (nagios) that postfix sometimes was hanging on a port25 health check, which I was able to correlate to seeing OpenDKIM Filter: accept() returned invalid socket in the logs

Mar 30 04:05:53 aggelos-front3.vc.bravenet.com opendkim1276100: OpenDKIM Filter: accept() returned invalid socket (Numerical result out of range), try again

  • This is intermittant at the times I find the postfix issue, but suspect that's due to it making use of the remaining file handles.

Suspecting this may be due to too many open file handles, but need to catch this in the act one more time to confirm.

ss -anp is showing a ton of open connections via udp port 53 to our bind dns server (EL8)

root@aggelos-front3 [/root]# ss -anp | grep 683086 | sed 's/ *$//' u_dgr UNCONN 0 0 * 206888104 * 13641 users:(("opendkim",pid=683086,fd=6)) udp ESTAB 0 0 172.16.30.192:60461 172.16.0.4:53 users:(("opendkim",pid=683086,fd=167)) udp ESTAB 0 0 172.16.30.192:60473 172.16.0.201:53 users:(("opendkim",pid=683086,fd=130)) udp ESTAB 0 0 172.16.30.192:40004 172.16.0.4:53 users:(("opendkim",pid=683086,fd=105)) udp ESTAB 0 0 172.16.30.192:52310 172.16.0.4:53 users:(("opendkim",pid=683086,fd=37)) udp ESTAB 0 0 172.16.30.192:60505 172.16.0.5:53 users:(("opendkim",pid=683086,fd=220)) udp ESTAB 0 0 172.16.30.192:56424 172.16.0.4:53 users:(("opendkim",pid=683086,fd=53)) udp ESTAB 0 0 172.16.30.192:44227 172.16.0.5:53 users:(("opendkim",pid=683086,fd=61)) udp ESTAB 0 0 172.16.30.192:40135 172.16.0.201:53 users:(("opendkim",pid=683086,fd=125)) udp ESTAB 0 0 172.16.30.192:60616 172.16.0.4:53 users:(("opendkim",pid=683086,fd=186)) udp ESTAB 0 0 172.16.30.192:60633 172.16.0.201:53 users:(("opendkim",pid=683086,fd=25)) udp ESTAB 0 0 172.16.30.192:40154 172.16.0.201:53 users:(("opendkim",pid=683086,fd=71)) udp ESTAB 0 0 172.16.30.192:36067 172.16.0.4:53 users:(("opendkim",pid=683086,fd=42)) udp ESTAB 0 0 172.16.30.192:52527 172.16.0.4:53 users:(("opendkim",pid=683086,fd=218)) udp ESTAB 0 0 172.16.30.192:52605 172.16.0.4:53 users:(("opendkim",pid=683086,fd=97)) udp ESTAB 0 0 172.16.30.192:60818 172.16.0.4:53 users:(("opendkim",pid=683086,fd=27)) udp ESTAB 0 0 172.16.30.192:60824 172.16.0.4:53 users:(("opendkim",pid=683086,fd=95)) udp ESTAB 0 0 172.16.30.192:52643 172.16.0.201:53 users:(("opendkim",pid=683086,fd=157)) udp ESTAB 0 0 172.16.30.192:60892 172.16.0.4:53 users:(("opendkim",pid=683086,fd=83)) udp ESTAB 0 0 172.16.30.192:36361 172.16.0.4:53 users:(("opendkim",pid=683086,fd=101)) udp ESTAB 0 0 172.16.30.192:56866 172.16.0.4:53 users:(("opendkim",pid=683086,fd=152)) udp ESTAB 0 0 172.16.30.192:48692 172.16.0.201:53 users:(("opendkim",pid=683086,fd=235)) udp ESTAB 0 0 172.16.30.192:48713 172.16.0.5:53 users:(("opendkim",pid=683086,fd=82)) udp ESTAB 0 0 172.16.30.192:36500 172.16.0.4:53 users:(("opendkim",pid=683086,fd=93)) udp ESTAB 0 0 172.16.30.192:48800 172.16.0.4:53 users:(("opendkim",pid=683086,fd=198))

Getting a count of open UDP port 53 sockets

root@aggelos-front3 [/root]# ss -anp | grep 683086 | sed 's/ *$//' | grep 'udp.*53' | wc -l 138 (About a day or two after restarting opendkim)

Checking 8 hrs later, udp port 53 count was 215

This was on AlmaLinux 8.5, 2.11.0-0.28

Also reported at https://bugzilla.redhat.com/show_bug.cgi?id=2070441

conathan avatar Mar 31 '22 19:03 conathan

I'm facing this issue too since I started using 2.11.0. I've used 2.10.3 for years in the same environment and never had this problem. I upgraded from version 2.10.3 to 2.11.0 in last April and immediately started having this problem. libmilter was compiled with SM_CONF_POLL. Thx.

KIC-8462852 avatar Jun 14 '22 08:06 KIC-8462852

Hey, I'd like to reproduce this if possible, which I guess means firing up an Arch linux VM. Could you let me know:

  • How opendkim was installed (package or compile, if it was your own compile, what configure args you used).

  • What kind of ram your system has, and if there's anything else significant running that might be hitting this exhaustion level.

  • What's your upstream resolver -- clearly it's off-box, and on a nat'd subnet -- are you doing DNSSEC validation? Can you say what software it's running?

  • Your openDKIM conf file.

  • More log snippets than a single log line when this occurs (i.e. how constant is the error logged, etc).

If any of this is stuff you'd rather not have in a public ticket, please feel free to email me: gushi at gushi dot org with the details instead.

thegushi avatar Jan 07 '23 06:01 thegushi

Note: As of this reply, I have not confirmed it is still happening (worked around this by restarting opendkim whenever there are more then 100 open connections to port 53),
opendkim:30 * * * * root test 0$(ss -anp | grep '53.*opendkim' | sed 's/ *$//' | wc -l) -gt 100 && systemctl restart opendkim@bravehost && echo "systemctl restart opendkim@bravehost" > /dev/null opendkim:31 * * * * root test 0$(ss -anp | grep '53.*opendkim' | sed 's/ *$//' | wc -l) -gt 100 && systemctl restart opendkim && echo systemctl restart opendkim > /dev/null

I believe still happening as I can see 76 connections established to our dns servers. root@aggelos-front1 [/etc/cron.d]# ss -anp | grep '53.*opendkim' | sed 's/ *$//' | wc -l 76

Hey, I'd like to reproduce this if possible, which I guess means firing up an Arch linux VM. Could you let me know:

* How opendkim was installed (package or compile, if it was your own compile, what configure args you used).

For my setup, This was installed via AlmaLinux 8.7 using opendkim from epel. https://src.fedoraproject.org/rpms/opendkim (Fedora EPEL 8)

* What kind of ram your system has, and if there's anything else significant running that might be hitting this exhaustion level.

It is running inside a virtual machine with 9GB of ram, and 4GB of swap. (We got 4 such VM's, at the time of this ticket, all active servers were having the same behavior). The Server hosting the VM is running 32GB (4x8GB), ECC Ram, on a Intel S5520HC.

* What's your upstream resolver -- clearly it's off-box, and on a nat'd subnet -- are you doing DNSSEC validation?  Can you say what software it's running?

Currently AlmaLinux 9.1, Bind 9.16.23-5.el9, Prior, would have on AlmaLinux 8.6 most likely (Unsure at time of ticket, but currently 9.11.36-5 for EL8.7).

* Your openDKIM conf file.

root@aggelos-front1 [/etc]# cat opendkim.bravehost.conf | grep -v '^#' | grep -v ^$ PidFile /var/run/opendkim/opendkim.bravehost.pid Mode sv Syslog yes SyslogSuccess yes LogWhy yes UserID opendkim:opendkim Socket inet:8892@localhost Umask 002 SendReports no SoftwareHeader yes Canonicalization relaxed/relaxed Selector default MinimumKeyBits 1024 KeyFile /etc/opendkim/keys/default.private KeyTable dsn:mysql://REDACTED SigningTable dsn:mysql://REDACTED InternalHosts refile:/etc/opendkim/TrustedHosts OversignHeaders From

* More log snippets than a single log line when this occurs (i.e. how constant is the error logged, etc).

Currently no additional logs then I have provided above, (and as this was affecting our customer's mail, I would prefer not reintroducing this), but that one line was the only one present related to the issue. My belief is that it reached the max number of open files handles, around something like 1000 or 1024.

If any of this is stuff you'd rather not have in a public ticket, please feel free to email me: gushi at gushi dot org with the details instead.

Speaking of, to change the threshold on our side to restart opendkim when it exceeds 600 connections vs 100, to see if this is still happening (Believe so, as it has 74 open udp 53 connections).

conathan avatar Jan 09 '23 19:01 conathan

How opendkim was installed (package or compile, if it was your own compile, what configure args you used):

configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --libexecdir=/usr/sbin --infodir=/usr/share/info --mandir=/usr/share/man --enable-filter --enable-shared --disable-static --disable-debug --disable-codecoverage --disable-query_cache --with-openssl --with-milter

cat /proc/meminfo MemTotal: 16547484 kB MemFree: 10074156 kB MemAvailable: 13517520 kB Buffers: 254884 kB Cached: 4354740 kB SwapCached: 0 kB (...)

DNSSEC validation: Yes DNS resolver: 1.1.1.1 and local recursive resolver using BIND 9.16.36.

cat /etc/opendkim.conf AuthservID [REDACTED] Mode sv Domain [REDACTED] Selector default KeyFile /etc/opendkim/default.private Canonicalization relaxed/relaxed SignatureAlgorithm rsa-sha256 SignHeaders From,To,Subject,Date,Reply-To,Sender,Message-Id InternalHosts /etc/opendkim/internal-hosts.txt Socket local:/var/run/opendkim/opendkim-milter.socket Userid nobody:nobody AlwaysAddARHeader yes AutoRestart yes AutoRestartRate 10/1h Umask 0117 Syslog yes SyslogSuccess yes LogResults yes Nameservers 1.1.1.1,[REDACTED]

Thx.

KIC-8462852 avatar Jan 11 '23 13:01 KIC-8462852

Following up on my earlier post, to confirm still a issue

root@aggelos-front1 [/root]# ss -anp | grep '53.*opendkim' | sed 's/ *$//' | wc -l 325

(Small snippet of the output)

udp   ESTAB      0       0                                   172.16.30.189:40403               172.16.0.4:53        users:(("opendkim",pid=81905,fd=83))
udp   ESTAB      0       0                                   172.16.30.189:48599               172.16.0.5:53        users:(("opendkim",pid=81905,fd=305))
udp   ESTAB      0       0                                   172.16.30.189:48615               172.16.0.4:53        users:(("opendkim",pid=81905,fd=390))
udp   ESTAB      0       0                                   172.16.30.189:56819               172.16.0.4:53        users:(("opendkim",pid=81905,fd=24))
udp   ESTAB      0       0                                   172.16.30.189:40557               172.16.0.4:53        users:(("opendkim",pid=81905,fd=290))
udp   ESTAB      0       0                                   172.16.30.189:40568               172.16.0.4:53        users:(("opendkim",pid=81905,fd=56))
udp   ESTAB      0       0                                   172.16.30.189:40673               172.16.0.4:53        users:(("opendkim",pid=81905,fd=512))

conathan avatar Jan 31 '23 17:01 conathan