vesta
vesta copied to clipboard
Emails have stopped working since 1.0.0.4 update
Operating System (OS/VERSION):
Type here, e.g. Ubuntu 18.04
VestaCP Version:
Type here, e.g. 1.0.0.4
Installed Software (what you got with the installer):
php-fpm, apache, nginx, mysql
Steps to Reproduce:
Trying to send and receive emails
Other Notes:
I can still send emails to the domain - I see them come up in the /cur folder, so they are getting through. But actually connecting and downloading doesn't work now. This is what I get now:
I see the following in rejectlog:
2021-11-03 07:57:38 dovecot_plain authenticator failed for (smtpclient.apple) [85.255.233.131]: 435 Unable to authenticate at present: authentication socket connection error
2021-11-03 07:58:06 dovecot_plain authenticator failed for (smtpclient.apple) [85.255.233.131]: 435 Unable to authenticate at present: authentication socket connection error
2021-11-03 07:58:42 dovecot_plain authenticator failed for (smtpclient.apple) [185.69.145.253]: 435 Unable to authenticate at present: authentication socket connection error
We have lots of disk space available, so I don't think thats the issue.
Any suggestions on what else to try?
Also, not sure if this is part of the main problem or not - but I see this on one of our catchall accounts:
2021-11-03 09:30:31 1miCbH-0005ab-IU == [email protected] <[email protected]> R=localuser T=local_delivery defer (-1): Malformed value "0MM" (expansion of "${extract{6}{:}{${lookup{$local_part}lsearch{/etc/exim4/domains/$domain/passwd}}}}M") in local_delivery transport
When someone is trying to email one of the forwarding addresses, I see this in rejectlog:
2021-11-03 09:47:24 dovecot_login authenticator failed for ([127.0.0.1]) [103.238.228.121]: 535 Incorrect authentication data ([email protected])
exim4.conf.template for the local delivery looks like:
local_delivery:
driver = appendfile
maildir_format
maildir_use_size_file
user = ${extract{2}{:}{${lookup{$local_part}lsearch{/etc/exim4/domains/$domain/passwd}}}}
group = mail
create_directory
directory_mode = 770
mode = 660
use_lockfile = no
delivery_date_add
envelope_to_add
return_path_add
directory = "${extract{5}{:}{${lookup{$local_part}lsearch{/etc/exim4/domains/$domain/passwd}}}}/mail/$domain/$local_part"
quota = ${extract{6}{:}{${lookup{$local_part}lsearch{/etc/exim4/domains/$domain/passwd}}}}M
quota_warn_threshold = 75%
maybe the problem with ownership on dovecot auth i need to check it on our test server.
Does the spamd and clamd installed on your server ?
There is a pull to fix this, a I think this happen only in debian / ubuntu
El mié., 3 nov. 2021 10:50, Andy @.***> escribió:
Also, not sure if this is part of the main problem or not - but I see this on one of our catchall accounts:
2021-11-03 09:30:31 1miCbH-0005ab-IU == @.*** @.***> R=localuser T=local_delivery defer (-1): Malformed value "0MM" (expansion of "${extract{6}{:}{${lookup{$local_part}lsearch{/etc/exim4/domains/$domain/passwd}}}}M") in local_delivery transport
When someone is trying to email one of the forwarding addresses, I see this in rejectlog:
2021-11-03 09:47:24 dovecot_login authenticator failed for ([127.0.0.1]) [103.238.228.121]: 535 Incorrect authentication data (set_id= @.***)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/serghey-rodin/vesta/issues/2137#issuecomment-958792296, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEVO5MLCACBQSKEHT2BNXDUKEAVJANCNFSM5HII32DQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thanks @anton-reutov - yes we have clamd running on the server:
service clamav-freshclam status
● clamav-freshclam.service - ClamAV virus database updater
Loaded: loaded (/lib/systemd/system/clamav-freshclam.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2021-05-08 07:03:54 UTC; 5 months 26 days ago
Docs: man:freshclam(1)
man:freshclam.conf(5)
http://www.clamav.net/lang/en/doc/
Main PID: 764 (freshclam)
CGroup: /system.slice/clamav-freshclam.service
└─764 /usr/bin/freshclam -d --foreground=true
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: DON'T PANIC! Read http://www.clamav.net/documents/upgrading-clamav
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: WARNING: getpatch: Can't download main-60.cdiff from database.clamav.net
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: WARNING: getpatch: Can't download main-60.cdiff from database.clamav.net
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: WARNING: getpatch: Can't download main-60.cdiff from database.clamav.net
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: WARNING: getpatch: Can't download main-60.cdiff from database.clamav.net
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: ERROR: getpatch: Can't download main-60.cdiff from database.clamav.net
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: WARNING: Incremental update failed, trying to download main.cvd
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: ERROR: Can't download main.cvd from database.clamav.net
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: Giving up on database.clamav.net...
Nov 03 09:45:12 admin.chambresdhotes.org freshclam[764]: Update failed. Your network may be down or none of the mirrors listed in /etc/clamav/freshclam.conf is working. Check http://www.clamav.net/doc/mirrors-faq.html for possible reason
root@admin:/home/chambres/mail/chambresdhotes.org/andy/new# service clamav-daemon status
● clamav-daemon.service - Clam AntiVirus userspace daemon
Loaded: loaded (/lib/systemd/system/clamav-daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2021-05-08 07:03:54 UTC; 5 months 26 days ago
Docs: man:clamd(8)
man:clamd.conf(5)
http://www.clamav.net/lang/en/doc/
Main PID: 768 (clamd)
CGroup: /system.slice/clamav-daemon.service
└─768 /usr/sbin/clamd --foreground=true
Nov 03 00:25:06 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 01:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 02:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 03:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 04:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 05:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 06:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 07:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 08:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
Nov 03 09:25:07 admin.chambresdhotes.org clamd[768]: SelfCheck: Database status OK.
maybe the problem with ownership on dovecot auth i need to check it on our test server.
Thanks.
@Skamasle - do you have a link to the pull? Thanks :)
Ok I have managed to get the "catchall" working. I found this:
https://github.com/serghey-rodin/vesta/pull/2125
They mentioned something about quotas. We had that account setup as unlimited, but when I changed it to a fixed quota, it started accepting the catchall emails
I still have the issue though of not being able to send/receive on any of the IMAP/SMTP stuff with Exim though :(
Any more ideas? I still can't send or receive :/ I can see the mails going into the /new folder in the /mail dir, but thats not much good if we can't get to them (or send replies) :(
What is the status of dovecot
service? (output of service dovecot status
command?)
Ah ha - got it! dovcot status
came back fine - but service exim4 status
gave:
`Nov 04 06:53:39 admin.chambresdhotes.org exim[8206]: write failed on panic log: length=102 result=-1 errno=28 (No space left on device)
`
There is still 30gb of space on the server, so maybe it crashed overnight when the updates/backups are done. I'll have to see if I can clean a bit more space up. Hopefully thats sorted it now :)
What about inodes ?
Run df -i
El jue., 4 nov. 2021 8:36, Andy @.***> escribió:
Ah ha - got it! dovcot status came back fine - but service exim4 status gave:
`Nov 04 06:53:39 admin.chambresdhotes.org exim[8206]: write failed on panic log: length=102 result=-1 errno=28 (No space left on device)
`
There is still 30gb of space on the server, so maybe it crashed overnight when the updates/backups are done. I'll have to see if I can clean a bit more space up. Hopefully thats sorted it now :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/serghey-rodin/vesta/issues/2137#issuecomment-960519665, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEVO5LXZKXMBIO7NYGDN3LUKIZWDANCNFSM5HII32DQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@Skamasle thanks for the reply. Its actually all working again now. The issue was more around disk space. While there was enough space, on the nightly backups it was running out (as there was only 25gb free). I've run some cleanups on the server and rid 100gb worth of crap we don't need any more (old images from listings). I guess it was just bad timing that it happened right after the update ;)
@youradds Do you use remote backup? If yes then see https://github.com/serghey-rodin/vesta/issues/2136 if the local backup folder don't contain backup from few days.
@youradds Do you use remote backup? If yes then see #2136 if the local backup folder don't contain backup from few days.
Nah we use local backups. Then have urbackup run as a backup client, which sends the files off to another server incrimentally :) We do have the Vesta backups running as well, but it only does the emails / log files / configs etc, and not the web files, as those are pretty large
Good evening, is there a solution for this issue? I have same problem with exim4 and many customers messages in queue. I'm not able to force the processing of the mails (the sending process goes in Timeout).
Is there a way to rollback VestaCP to older version (ubuntu server)?
Thanks Vasco
Good morning, I had the same space problem and what I did was: cd /var/lib/fail2ban delete all rm -rf * (Ubuntu and Centos) and then reboot and the disk is no longer full, then to send and receive emails use rpm -Uvh --oldpackage exim-4.93-3.el7.x86_64.rpm (Centos7) and then everything works perfectly.
Yup the f2b log file can get pretty large! For me though, it was genuinely out of space due to old images that were not being removed from a website. Check out your space 👍
df -h ./
and inodes:
df -hi ./
Also check out /var/log/exim4/rejectlog and mainlog. For me, I had errors in there about:
2021-11-03 09:30:31 1miCbH-0005ab-IU == [email protected] <[email protected]> R=localuser T=local_delivery defer (-1): Malformed value "0MM" (expansion of "${extract{6}{:}{${lookup{$local_part}lsearch{/etc/exim4/domains/$domain/passwd}}}}M") in local_delivery transport
Which turned out to be an issue with the email accounts. It didn't work any more on "unlimited" space for email accounts, so I just had to set a large amount - i.e 10gb for each of those accounts that were set as unlimited before
ncdu
can also help you find large dirs, and see if you can spot big bits of space being used
I deleted the fail2ban file. The messages are still in queue after a reboot. If i run df -h ./ i get:
root@vestacp:/var/lib/fail2ban# df -h ./
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 98G 61G 33G 66% /
Do you know if a similar solution (rpm -Uvh --oldpackage exim-4.93-3.el7.x86_64.rp) is available for Ubuntu?
Do you get any error logs in the files I said? Also what do you get for "service exim4 status" (and dovecot too)
Also, check df -hi .
(i for inodes). If you have a lot of small files it can cause your inodes to run out / get low
Here the outputs: root@vestacp:/# tail /var/log/exim4/rejectlog
2021-11-09 11:25:02 H=spruce-goose-at.twitter.com [199.59.150.89] X=TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128 CV=no F=<[email protected]> rejected RCPT <[email protected]>: Unrouteable address
2021-11-09 11:33:28 H=spring-chicken-bm.twitter.com [199.16.156.178] X=TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128 CV=no F=<[email protected]> rejected RCPT <[email protected]>: Unrouteable address
2021-11-09 11:52:39 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
2021-11-09 12:21:01 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
2021-11-09 12:24:28 SMTP protocol synchronization error (input sent without waiting for greeting): rejected connection from H=[178.73.215.171] input="\377\375\003\377\373\030\377\373\037\377\373 \377\373!\377\373"\377\373'\377\375\005\377\373#"
2021-11-09 12:49:17 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
2021-11-09 13:17:33 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
2021-11-09 13:45:49 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
2021-11-09 14:24:32 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
2021-11-09 14:54:07 dovecot_login authenticator failed for (xxx.ch) [45.133.1.6]: 535 Incorrect authentication data ([email protected])
root@vestacp:/# service exim4 status
● exim4.service - LSB: exim Mail Transport Agent
Loaded: loaded (/etc/init.d/exim4; generated)
Active: active (running) since Tue 2021-11-09 14:48:38 CET; 27min ago
Docs: man:systemd-sysv-generator(8)
Process: 1914 ExecStart=/etc/init.d/exim4 start (code=exited, status=0/SUCCESS)
Tasks: 4 (limit: 4915)
CGroup: /system.slice/exim4.service
├─2241 /usr/sbin/exim4 -bd -q30m
├─2242 /usr/sbin/exim4 -qG
├─4625 /usr/sbin/exim4 -qG
└─4626 /usr/sbin/exim4 -qG
Nov 09 14:48:37 vestacp.extranet systemd[1]: Starting LSB: exim Mail Transport Agent...
Nov 09 14:48:37 vestacp.extranet exim4[1914]: * Starting MTA
Nov 09 14:48:38 vestacp.extranet exim4[1914]: ...done.
Nov 09 14:48:38 vestacp.extranet exim4[1914]: ALERT: exim paniclog /var/log/exim4/paniclog has non-zero size, mail system possibly broken
Nov 09 14:48:38 vestacp.extranet systemd[1]: Started LSB: exim Mail Transport Agent.
root@vestacp:/# service dovecot status
● dovecot.service - Dovecot IMAP/POP3 email server
Loaded: loaded (/lib/systemd/system/dovecot.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-11-09 14:48:32 CET; 27min ago
Docs: man:dovecot(1)
http://wiki2.dovecot.org/
Main PID: 1348 (dovecot)
Tasks: 13 (limit: 4915)
CGroup: /system.slice/dovecot.service
├─1348 /usr/sbin/dovecot -F
├─1590 dovecot/anvil
├─1591 dovecot/log
├─1601 dovecot/config
├─2301 dovecot/imap-login
├─2304 dovecot/imap
├─2305 dovecot/imap-login
├─2306 dovecot/imap
├─4073 dovecot/imap-login
├─4076 dovecot/imap
├─4079 dovecot/imap-login
├─4080 dovecot/imap
└─4559 dovecot/imap
Nov 09 14:48:32 vestacp.extranet systemd[1]: Started Dovecot IMAP/POP3 email server.
root@vestacp:/#
Nov 09 14:48:38 vestacp.extranet exim4[1914]: ALERT: exim paniclog /var/log/exim4/paniclog has non-zero size
What does that have in it? If memory serves me, you don't get delivery is paniclog has content in
root@vestacp:/# tail /var/log/exim4/paniclog
2021-11-09 11:06:21 1mkO1F-0004dQ-4z spam acl condition: all spamd servers failed
Check spamassassin then :)
service spamassassin status
root@vestacp:/# service spamassassin status
● spamassassin.service - Perl-based spam filter using text analysis
Loaded: loaded (/lib/systemd/system/spamassassin.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-11-09 14:48:37 CET; 39min ago
Process: 1325 ExecStart=/usr/sbin/spamd -d --pidfile=/var/run/spamd.pid $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1729 (spamd)
Tasks: 3 (limit: 4915)
CGroup: /system.slice/spamassassin.service
├─1729 /usr/bin/perl -T -w /usr/sbin/spamd -d --pidfile=/var/run/spamd.pid --create-prefs --max-children 5 --helper-home-dir
├─1912 spamd child
└─1913 spamd child
Nov 09 14:48:37 vestacp.extranet spamd[1729]: spamd: server successfully spawned child process, pid 1913
Nov 09 14:48:37 vestacp.extranet spamd[1729]: prefork: child states: SI
Nov 09 14:48:37 vestacp.extranet systemd[1]: Started Perl-based spam filter using text analysis.
Nov 09 14:48:37 vestacp.extranet spamd[1729]: prefork: child states: II
Nov 09 15:01:44 vestacp.extranet spamd[1912]: spamd: connection from 127.0.0.1 [127.0.0.1]:40456 to port 783, fd 6
Nov 09 15:01:44 vestacp.extranet spamd[1912]: spamd: setuid to debian-spamd succeeded
Nov 09 15:01:44 vestacp.extranet spamd[1912]: spamd: checking message <[email protected]> for debian-spamd:119
Nov 09 15:01:46 vestacp.extranet spamd[1912]: spamd: clean message (0.0/5.0) for debian-spamd:119 in 2.1 seconds, 21996 bytes.
Nov 09 15:01:46 vestacp.extranet spamd[1912]: spamd: result: . 0 - DKIM_SIGNED,DKIM_VALID,HTML_MESSAGE,MIME_BASE64_TEXT,RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_PASS,URIBL_BLOCKED scantime=2.1,size=21996,user=debian-spamd,uid=119,required_score=5.0,rhost=127.0.0.1,raddr=127.0.0.1,rport=40456,mid=<[email protected]>,autolearn=ham autolearn_force=no
Nov 09 15:01:46 vestacp.extranet spamd[1729]: prefork: child states: II
echo "" > /var/log/exim4/paniclog
That'll clear the log file out. Then try a reboot
Done but the mail are still in queue. I tried to send a new mail (ID 1mkSNJ-0000tM-3w, followin the result in the mainlog):
root@vestacp:/var/log/exim4# tail mainlog
2021-11-09 15:44:53 1mk7bt-0005z7-Hy == [email protected] R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for 'gmail.com'
2021-11-09 15:44:53 1mjxYr-0000PR-8N == [email protected] R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for 'gmail.com'
2021-11-09 15:44:53 1mjys3-0001lx-JJ == [email protected] R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for 'gmail.com'
2021-11-09 15:44:53 1mk6F5-0003oZ-T0 == [email protected] R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for 'gmail.com'
2021-11-09 15:44:53 1mkRpf-0001BY-0v == [email protected] R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for 'gmail.com'
2021-11-09 15:44:53 1mjysD-0001lx-QG == [email protected] R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for 'gmail.com'
2021-11-09 15:44:53 1mkP70-0007Eq-47 == [email protected] <[email protected]> R=dnslookup T=remote_smtp defer (-53): retry time not reached for any host for '123.ch'
2021-11-09 15:45:25 1mkSNJ-0000tM-3w <= [email protected] H=mob-194-230-145-111.cgn.sunrise.net (smtpclient.apple) [194.230.145.111] P=esmtpsa X=TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256 CV=no A=dovecot_plain:[email protected] K S=626 [email protected]
2021-11-09 15:47:04 1mkPlu-000864-NQ H=mxbw-bluewin-ch.hdb-cs04.ellb.ch [195.186.227.50] Connection timed out
2021-11-09 15:47:35 1mkSNJ-0000tM-3w H=mx-eu.mail.am0.yahoodns.net [188.125.72.74] Connection timed out
root@vestacp:/var/log/exim4#
Can you curl domains?
curl google.com
I saw this one earlier:
https://forum.vestacp.com/viewtopic.php?f=10&t=19158&e=1&view=unread#unread
I have curl on mine out of the box,
Hard to know without seeing more. Is the rest of your server working? Did it only stop working with the Vesta update? BTW I'm only a server novice, so only able to help as much as I can :) Someone with more knowledge may be able to help more
Sorry wrong test, yes I'm able to curl google.com (i got the html) The issue is only present for outgoing mail. The rest of Vesta seems to work properly. is seems that the ourtgoing calls are going in timeout each time. Someone can advise on how to rollback vesta to earlier version in Ubuntu (apt install) since I'm facing issues in Production?
Bit of a curveball - but does your OS have the latest LetsEncrypt CAs? They expired end of last month I believe, and I had to update the root server certificates to get LE working again
Yes it has the latest versions of CAs, HTTPS websites are also working properly. I've also made a fresh install of vestacp on a new ubuntu server but the problem is the same. Any suggestion on how to identify the issue for the outgoing connections (affected by timeout)? If needed I can grant access to this specific server in order to analyze the issue (no data present) for a while.