modoboa icon indicating copy to clipboard operation
modoboa copied to clipboard

Amavis crashed every 20 messages: (!)Amavis::END: DB unregistering failed

Open gianks opened this issue 4 years ago • 9 comments

Impacted versions

1.17.0

  • OS Type: Debian
  • OS Version: 11
  • Database Type: PostgreSQL
  • Database version: X.y
  • Modoboa: 1.17.0
  • installer used: Yes
  • Webserver: Nginx

Steps to reproduce

Receive 20 or more emails.

Current behavior

Repeated tests show Amavis crashing exactly after 20 processed emails with message: amavis[XX]: (XX-20) (!)Amavis::END: DB unregistering failed:

Expected behavior

Amavis keeps running without any intervention.

Video/Screenshot link (optional)

gianks avatar Mar 16 '22 11:03 gianks

A possible solution seems to be setting $max_server = 2 instead of 1 in /etc/amavis/conf.d/50-user (generated by modoboa). I did not investigate extensively why, but the magical number 20 is defined within amavis-new scritp, as maximum child requests.

I suspect that using a single server the program terminates when the first child arrives at 20 instead of spawning a new one, but it's just a guess.

gianks avatar Mar 16 '22 13:03 gianks

Hi, I got the same issue, as a workaround I added two lines into the service file /etc/systemd/system/multi-user.target.wants/amavis.service to restart the service:

[Service]
+Restart=always
+RestartSec=3

vingeni avatar Mar 21 '22 13:03 vingeni

How much system ressources did you provide your modoboa machine?

MrGeneration avatar Mar 24 '22 13:03 MrGeneration

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 25 '22 21:05 stale[bot]

Hello, I had the same problem in debian 11.3 with installer, I fixed incrementing $max_server = 2 in /etc/amavis/conf.d/50-user, at the moment I don't have to apply systemd configuration as mentioned by vingeni.

blackhold avatar Jun 10 '22 15:06 blackhold

Same issue here. Lots of resources available in principle for one domain/~30 mailboxes, but amavis was restarting - and Monit reporting it - quite manically. I doubled memory, pursued the logs and eventually arrived here. No reports since setting $max_server = 2.

However I did note that this value is reported to need to match the value in postfix configuration (see, eg here: https://docs.iredmail.org/concurrent.processing.html ). That is smtp-amavis value should equal $max_server value.

I couldn't actually find this config value though. Is it true? Did I misunderstand?

IppyD avatar Jun 27 '22 15:06 IppyD

same issue here, it crashes after 20 mails, I've set up the max_server to 2 to see if it works. @IppyD did you try adding smtp-amavis_destination_concurrency_limit = 2 to main.cf maybe ?

Spitfireap avatar Jul 17 '22 17:07 Spitfireap

not sure if related, but on my machine with 8GB RAM and 6 cpu cores I did set $max_servers to 8 and now I have no more timeout problems on load. Before, if I sent 5-10MB .exe files to 10 different aliases on my machine half of the mails would time out and the other mailserver would resend half of them because they had the "mail queue write error" aka postfix/smtpd[3972]: warning: timeout talking to proxy inet:[127.0.0.1]:10024

However I did note that this value is reported to need to match the value in postfix configuration (see, eg here: https://docs.iredmail.org/concurrent.processing.html ). That is smtp-amavis value should equal $max_server value.

I was looking into this too, I checked both master and main.cf and I could not identify the service used for amavis. There is a flag -o smtpd_proxy_options=speed_adjust after -o smtpd_proxy_filter=inet:[127.0.0.1]:10024 with smtpd so I guess this is what automatically adjusts the amount of instances that are spun up. Either way, it works fine without either configuration change. To my understanding adding smtp-amavis_destination_concurrency_limit = 2 would serve no purpose since there is no smtp-amavis service this would refer to

nicosqc avatar Jul 21 '22 13:07 nicosqc

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 20 '22 16:09 stale[bot]

Hello,

I started to receive tens of "Out: 451 4.3.0 Error: queue file write error" mail from the new 2.0.2 Modoboa server I just installed. I installed the "lmtp patch" (using description in https://github.com/modoboa/modoboa-installer/pull/425).

It seems that it fails as soon as the server receives more than two or three mails at the same time. However, I have a correct server with 8Go of RAM.

Regards,

stefaweb avatar Sep 28 '22 16:09 stefaweb

Hello,

No more error since yesterday.

I think it would be better to have the configuration using lmtp by default because it seems that even with few messages coming to the server, the system through the proxy seems unreliable.

Regards,

stefaweb avatar Sep 29 '22 10:09 stefaweb

FYI: https://www.postfix.org/SMTPD_PROXY_README.html#pros_cons

tonioo avatar Sep 30 '22 07:09 tonioo

Hello,

The server is an Scaleway VM cloud instance (PRO2-XXS) with 2 cores 64Bits, SSD block storage and 8Go de RAM. No swap. System is Debian bullseye uptodate.

I tried again yesterday with the default proxy setup. The system crashes and loses mails as soon as more than 2 or 3 mails are processed at the same time. Whether they come from an auth login or not. The worst is that we are not warned. We just see it in mail.log.

Regards,

stefaweb avatar Sep 30 '22 07:09 stefaweb

Have you checked your master.cf file to check the number of allowed connections for the smtp service? (first one)

tonioo avatar Sep 30 '22 08:09 tonioo

I have the default config (modboboa-installer) + the change for amavis lmtp.

smtp inet n - - - 1 postscreen

I wrote something wrong. When the system crashes, I receive a system email with the error "queue file write error" but not for all lost emails.

Objet: Postfix SMTP server: errors from sonic304-22.consmr.mail.ir2.yahoo.com[77.238.179.147]

Transcript of session follows.

Out: 220 mail.xxxxx.fr ESMTP
In:  EHLO sonic304-22.consmr.mail.ir2.yahoo.com
Out: 250-mail.xxxxx.fr
Out: 250-PIPELINING
Out: 250-SIZE 11534336
Out: 250-ETRN
Out: 250-STARTTLS
Out: 250-AUTH PLAIN LOGIN
Out: 250-AUTH=PLAIN LOGIN
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  STARTTLS
Out: 220 2.0.0 Ready to start TLS
In:  EHLO sonic304-22.consmr.mail.ir2.yahoo.com
Out: 250-mail.xxxxxx.fr
Out: 250-PIPELINING
Out: 250-SIZE 11534336
Out: 250-ETRN
Out: 250-AUTH PLAIN LOGIN
Out: 250-AUTH=PLAIN LOGIN
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  MAIL FROM:<[[email protected]](mailto:[email protected])>
Out: 250 2.1.0 Ok
In:  RCPT TO:<[[email protected]](mailto: [email protected])>
Out: 250 2.1.5 Ok
In:  DATA
Out: 354 End data with <CR><LF>.<CR><LF>
Out: 451 4.3.0 Error: queue file write error
In:  QUIT
Out: 221 2.0.0 Bye

EDIT: and this in the log.

Sep 29 15:06:22 mailhub postfix/smtpd[67769]: timeout after END-OF-MESSAGE from mx0b-002e3701.pphosted.com[148.163.143.35]
Sep 29 19:22:31 mailhub postfix/smtpd[196091]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Sep 29 19:59:41 mailhub postfix/smtpd[210417]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]

stefaweb avatar Sep 30 '22 08:09 stefaweb

And what is your base RAM consumption when everything is running?

tonioo avatar Sep 30 '22 08:09 tonioo

I have this with the amavis lmtp config.

root@mailhub:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       2.0Gi       2.4Gi        23Mi       3.4Gi       5.5Gi
Swap:             0B          0B          0B

root@mailhub:~# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 2472916 141996 3449880    0    0    17    22   79  117  4  1 94  0  0

stefaweb avatar Sep 30 '22 08:09 stefaweb

Have you also updated the maxproc setting of the smtp-amavis service? It should be equal to $max_server

tonioo avatar Sep 30 '22 08:09 tonioo

Oops, I'm lost.

I will put the default settings without lmtp and I will do new tests.

stefaweb avatar Sep 30 '22 09:09 stefaweb

Here is the consumption with the default settings just after a reboot.

I will now wait to see if the errors start again.

` root@mailhub:~# free -h total used free shared buff/cache available Mem: 7.8Gi 1.8Gi 4.1Gi 21Mi 1.9Gi 5.7Gi Swap: 0B 0B 0B

root@mailhub:~# vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 4326024 36996 1933840 0 0 3404 505 171 247 8 4 86 3 0 `

Here are the default settings after the config via modoboa-installer.

/etc/amavis/conf.d/50-user
$inet_socket_port = [9998, 10024, 10026];
$max_servers = 1;

/etc/postfix/master.cf
smtp      inet  n       -       -       -       1       postscreen
smtpd     pass  -       -       -       -       -       smtpd
  -o smtpd_proxy_filter=inet:[127.0.0.1]:10024
  -o smtpd_proxy_options=speed_adjust

Nothing added in /etc/postfix/main.cf.

stefaweb avatar Sep 30 '22 09:09 stefaweb

Well, that didn't drag.

I did not receive an error mail on postmaster@.

Sep 30 11:06:41 mailhub postfix/smtpd[3547]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]

stefaweb avatar Sep 30 '22 09:09 stefaweb

And if you increase max_servers and max proc to 2?

tonioo avatar Sep 30 '22 09:09 tonioo

I'm going to try.

It seems to only do this when there is more than one mail to process. If there is only one, I don't get the error.

Sep 30 11:01:35 mailhub postfix/smtpd[3547]: connect from localhost[127.0.0.1]
Sep 30 11:01:35 mailhub postfix/smtpd[3547]: 8118021112: client=localhost[127.0.0.1], orig_client=82-64-xxxx-71.subs.proxad.net[82.64.xxx.71]
Sep 30 11:01:41 mailhub postfix/smtpd[3547]: 5F8A921113: client=localhost[127.0.0.1], orig_client=listes.xxxx.fr[62.210.xxx.5]
Sep 30 11:06:41 mailhub postfix/smtpd[3547]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Sep 30 11:06:41 mailhub postfix/smtpd[3547]: disconnect from localhost[127.0.0.1] ehlo=1 xforward=2 mail=2 rcpt=3 data=2 commands=10

stefaweb avatar Sep 30 '22 09:09 stefaweb

It doesn't like to set the values to 2. I put 1 back for max_servers and max proc to.

Sep 30 11:35:07 mailhub postfix/postscreen[1931]: fatal: service smtp requires a process limit of 1

stefaweb avatar Sep 30 '22 09:09 stefaweb

It's smtp-amavis service that you must update, not the smtp one

tonioo avatar Sep 30 '22 09:09 tonioo

And if we use theses commands in Amavis?

  $smtp_connection_cache_on_demand = 1;
  $smtp_connection_cache_enable = 1;

I found this old text while searching with Google.

A result of connection caching when it happens the session is held
in hope for a next task, which doesn't come soon enough.
No harm done. The feature can be disabled for low traffic sites.
- smtp client connection caching is a new feature which allows smtp client
code in amavisd to keep a SMTP session to MTA open after forwarding a
message or a notification, so that a next mail message that needs to be
sent by this child process can avoid re-establishing a session and the
initial greeting/EHLO (and TLS) handshake.

A current value of a global settings $smtp_connection_cache_enable
controls whether a session will be retained after forwarding a message
or not. Its default initial value is true.

A global setting $smtp_connection_cache_on_demand controls whether amavisd
is allowed to dynamically change the $smtp_connection_cache_enable setting
according to its estimate of the message frequency. The heuristics is
currently very simple: if time interval between a previous task completion
by this child process and the arrival of a current message is 5 seconds
or less, the $smtp_connection_cache_enable is turned on (which will affect
the next message); if the interval is 15 seconds or more, it is turned off.
The default value of the $smtp_connection_cache_on_demand is true, thus
enabling the adaptive behaviour.

On a busy server the connection caching can save some processing time.
Savings are substantial if client-side TLS is enabled, otherwise just a
few milliseconds are saved. On an idle server the feature may unnecessarily
keep sessions to MTA open (until MTA times them out), so one can disable
the feature by setting both controls to false (to 0 or undef).

To monitor the connection caching effectiveness, some SNMP-like counters
were added, so amavisd-agent may display something like:

OutConnNew 2764 319/h 98.2 % (OutMsgs)
OutConnQuit 2521 291/h 89.5 % (OutMsgs)
OutConnReuseFail 7 1/h 0.2 % (OutMsgs)
OutConnReuseRecent 21 2/h 0.7 % (OutMsgs)
OutConnReuseRefreshed 31 4/h 1.1 % (OutMsgs)
OutConnTransact 2816 325/h 100.0 % (OutMsgs)

stefaweb avatar Sep 30 '22 09:09 stefaweb

It's smtp-amavis service that you must update, not the smtp one

I no longer have smtp-amavis in master.cf because I switched to the default config without the smtp-amavis lmtp service. When I use the smtp-amavis lmtp service, I no longer have these errors.

When I used the lmtp config, I had 2 for max_servers and the max proc for the smtp-amavis service.

stefaweb avatar Sep 30 '22 09:09 stefaweb

And if we use theses commands in Amavis?

  $smtp_connection_cache_on_demand = 1;
  $smtp_connection_cache_enable = 1;

It still does with these settings. I removed them.

I switched back to lmtp config as long as there are no other tests to do.

stefaweb avatar Sep 30 '22 11:09 stefaweb

Again me.

I tried parsing the last logs with default proxy config + above two cache commands enabled.

I can't figure out if the "timeout after END-OF-MESSAGE" is an error or normal behavior with caches. The mail arrived well in the quarantine of the GUI if needed.

I did not have any other such error during this short test.

I removed the lines of the logs which are useless.

Sep 30 12:46:31 mailhub postfix/smtpd[17493]: connect from domain.org[62.210.xxx.xx]
Sep 30 12:46:32 mailhub postfix/verify[17695]: cache proxy:btree:/var/lib/postfix/verify_cache full cleanup: retained=10 dropped=0 entries
Sep 30 12:46:32 mailhub postfix/smtpd[17493]: NOQUEUE: client=domain.org[62.210.xxx.xx]
Sep 30 12:46:33 mailhub postfix/smtpd[17719]: connect from localhost[127.0.0.1]
Sep 30 12:46:33 mailhub postfix/smtpd[17719]: 4FF2221113: client=localhost[127.0.0.1], orig_client=domain.org[62.210.xxx.xx]
Sep 30 12:46:33 mailhub postfix/cleanup[17720]: 4FF2221113: message-id=<[email protected]>
Sep 30 12:46:33 mailhub opendkim[1315]: 4FF2221113: no signing table match for '[email protected]'
Sep 30 12:46:33 mailhub opendkim[1315]: 4FF2221113: no signature data
Sep 30 12:46:33 mailhub postfix/qmgr[1305]: 4FF2221113: from=<>, size=4595, nrcpt=1 (queue active)
Sep 30 12:46:33 mailhub amavis[9459]: (09459-01) Passed CLEAN {RelayedInbound}, [62.210.xxx.xx]:38418 [62.210.xxx.xx] <> -> <[email protected]>, Message-ID: <[email protected]>, mail_id: A13-Lh82CBo2, Hits: 0.002, size: 4117, queued_as: 4FF2221113, 1295 ms
Sep 30 12:46:33 mailhub postfix/smtpd[17493]: proxy-accept: END-OF-MESSAGE: 250 2.0.0 from MTA(smtp:[127.0.0.1]:10025): 250 2.0.0 Ok: queued as 4FF2221113; from=<> to=<[email protected]> proto=ESMTP helo=<cabanis.domain.org>
Sep 30 12:46:33 mailhub postfix/smtpd[17493]: disconnect from domain.org[62.210.xxx.xx] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Sep 30 12:51:33 mailhub postfix/smtpd[17719]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Sep 30 12:51:33 mailhub postfix/smtpd[17719]: disconnect from localhost[127.0.0.1] ehlo=1 xforward=1 mail=1 rcpt=1 data=1 commands=5

stefaweb avatar Sep 30 '22 11:09 stefaweb

Hello,

I redid the tests with the default proxy configuration.

If I add these two commands in /etc/amavis/conf.d/50-user the problem disappears.

$smtp_connection_cache_on_demand = 1;
$smtp_connection_cache_enable = 1;

Nothing else to change.

I'm going to let it run for several days to be sure.

stefaweb avatar Oct 01 '22 07:10 stefaweb