helm-charts [BUG] Authentication credentials invalid

Describe the bug Sometimes I cannot connect to mailu webmail, it shows me login window I can enter login and password, then it loads empty roundcube interface with red message on the bottom:

I can hit F5 several times and still no change, but when I hit logout and try again with same user/password (which comes from google password manager BTW) it's ok, everything loads. If I try sample wrong login data then it shows me that login/password is incorrect. Problem is the same on different browser/computer/phone. On all of them simply logout + login helps.

Checked pod logs and there is: mailu-frontend:

2023/10/23 14:59:42 [info] 19#19: *157171 client 10.42.4.219:44702 connected to 0.0.0.0:10143
2023/10/23 14:59:42 [info] 19#19: *157171 client login failed: "Authentication credentials invalid" while in http auth state, client: 10.42.4.219, server: 0.0.0.0:10143, login: "[email protected]"
10.42.1.89 - - [23/Oct/2023:14:59:42 +0200] "POST /webmail/?_task=mail&_action=refresh HTTP/1.1" 200 4226 "https://mail.example.net.pl/webmail/?_task=mail&_mbox=INBOX.github" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

mailu admin:

10.42.3.1 - - [23/Oct/2023:14:58:35 +0200] "GET /ping HTTP/1.1" 200 0 "-" "kube-probe/1.27"
192.168.192.1 - - [23/Oct/2023:14:58:42 +0200] "GET /internal/auth/user HTTP/1.0" 200 0 "https://mail.example.net.pl/webmail/?_task=mail&_mbox=INBOX.github" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
10.42.3.31 - - [23/Oct/2023:14:58:42 +0200] "GET /internal/auth/email HTTP/1.0" 200 0 "-" "-"
10.42.3.1 - - [23/Oct/2023:14:58:45 +0200] "GET /ping HTTP/1.1" 200 0 "-" "kube-probe/1.27"

rspamd complains about outdated cache, but this is probably something not that important:

2023-10-23 14:59:11 #195(controller) <yyr5hc>; map; rspamd_map_save_http_cached_file: saved data from http://mail-mailu-admin.mail.svc.cluster.local:8080/internal/rspamd/local_domains in /var/lib/rspamd/0090cd993765020aeebacae7c23c8447927ec9db.map, 119 bytes
2023-10-23 14:59:11 #195(controller) <yyr5hc>; map; rspamd_kv_list_fin: read hash of 6 elements from http://mail-mailu-admin.mail.svc.cluster.local:8080/internal/rspamd/local_domains
2023-10-23 14:59:38 #193(rspamd_proxy) <yyr5hc>; map; rspamd_map_common_http_callback: need to reread cached map triggered by http://mail-mailu-admin.mail.svc.cluster.local:8080/internal/rspamd/local_domains (1698065402 our modify time, 1698065951 cached modify time)
2023-10-23 14:59:38 #193(rspamd_proxy) <yyr5hc>; map; rspamd_map_read_cached: http://mail-mailu-admin.mail.svc.cluster.local:8080/internal/rspamd/local_domains: read map data cached 79 bytes
2023-10-23 14:59:38 #193(rspamd_proxy) <yyr5hc>; map; rspamd_kv_list_fin: read hash of 6 elements from http://mail-mailu-admin.mail.svc.cluster.local:8080/internal/rspamd/local_domains

Checking same thing after logout:

front:

2023/10/23 15:53:03 [info] 14#14: *158372 client 10.42.4.219:41900 connected to 0.0.0.0:10143
2023/10/23 15:53:03 [info] 14#14: *158372 client logged in, client: 10.42.4.219, server: 0.0.0.0:10143, login: "[email protected]", upstream: 10.43.5.82:143
2023/10/23 15:53:03 [info] 14#14: *158372 proxied session done, client: 10.42.4.219, server: 0.0.0.0:10143, login: "[email protected]", upstream: 10.43.5.82:143
10.42.1.89 - - [23/Oct/2023:15:53:03 +0200] "POST /webmail/?_task=mail&_action=refresh HTTP/1.1" 200 4135 "https://mail.example.net.pl/webmail/?_task=mail&_mbox=INBOX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

admin:

192.168.192.1 - - [23/Oct/2023:15:53:03 +0200] "GET /internal/auth/user HTTP/1.0" 200 0 "https://mail.example.net.pl/webmail/?_task=mail&_mbox=INBOX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
10.42.3.31 - - [23/Oct/2023:15:53:03 +0200] "GET /internal/auth/email HTTP/1.0" 200 0 "-" "-"
10.42.3.33 - - [23/Oct/2023:15:53:03 +0200] "GET /internal/dovecot/passdb/[email protected] HTTP/1.1" 200 68 "-" "Python/3.10 aiohttp/3.8.5"
10.42.3.33 - - [23/Oct/2023:15:53:03 +0200] "GET /internal/dovecot/userdb/[email protected] HTTP/1.1" 200 36 "-" "Python/3.10 aiohttp/3.8.5"

now also there is trace on dovecot:

INFO:root:Connect 
Oct 23 15:53:03 imap-login: Info: Login: user=<[email protected]>, method=PLAIN, rip=10.42.4.219, lip=10.42.3.31, mpid=899119, session=<IdeAimIIrKMKKgTb>
Oct 23 15:53:03 imap([email protected])<899119><IdeAimIIrKMKKgTb>: Info: Disconnected: Logged out in=93 out=1048 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0

Environment K3s + mailu from this helm chart, longhorn for storage.

Additional context It's also nothing about any csrf token, I confirmed this once that refreshing whole page before submiting data is not helping. Others have some problems with internal services routing but here there is maybe something about session and something that is not refreshed. It happens on different devices, os, on all of them re-logging helps. I also set replica count to 1 for each services.

Oct 23 '23 14:10 dominch

I occasionally run into this problem, too, using k3s + longhorn storage. I have Mailu configured to use a single RWX volume.

Nov 08 '23 05:11 rome-user

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Dec 09 '23 02:12 github-actions[bot]

This is still an issue :( Happens on few different devices (android, windows, linux)

Dec 11 '23 12:12 dominch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jan 11 '24 02:01 github-actions[bot]

This still is an issue, I got it about once per five logins, it's not browser issue (happens on few devices) but something on server side :/

Jan 11 '24 08:01 dominch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Feb 12 '24 02:02 github-actions[bot]

This still is an issue, I got it about once per five logins, it's not browser issue (happens on few devices) but something on server side :/

I'm encountering the same experience.

Feb 12 '24 07:02 kooskaspers

I also still have this issue, first login after some time cause this error, all I need to do then is to log out and log in again (no issues). I was able to get this error on several devices (phone, pc, tablet), also with fresh browsers with no data (cookies). It seems to be something on server side.

Feb 12 '24 08:02 dominch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Mar 15 '24 02:03 github-actions[bot]

This is still an issue :(

Mar 15 '24 08:03 dominch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 15 '24 04:04 github-actions[bot]

This is still an issue :(

Apr 15 '24 07:04 dominch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 16 '24 02:05 github-actions[bot]

This is still an issue

May 16 '24 06:05 kooskaspers

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jun 16 '24 02:06 github-actions[bot]

Still an issue

Jun 21 '24 08:06 rome-user

Did you check in Mailu repo if the issue is not happening for users not using the Helm chart? Anyways, I'll be releasing a new chart soon that'll support Mailu 2024.06.3, you should try it out and let me know if the issue still occurs with that version.

Jun 24 '24 00:06 fastlorenzo

I spent some time debugging this issue half year ago, as far as I remember I checked other bit newer images with no luck. Since then I was focused on different browsers, computers to confirm that it's for sure something at server side, something about session, hard to reproduce on demand, but happens from time to time and it's enough to relogin to skip it.

Of course with new chart I'll update and report back if it's already solved. I don't know how to reproduce that on demand so I;ll need some time to hit issue.

Jun 24 '24 07:06 dominch

Thanks for the feedback, I'll close this issue as cannot reproduce, feel free to reopen if you manage to reproduce it in the latest version

Jun 28 '24 21:06 fastlorenzo

This issue is still occurring with chart version 2.1.1 and app version 2024.06.11. The issue is not—assuming it is the same cause—related to Roundcube. It also occours for other clients when connecting directly to port 465/587. In my case, an external application is connecting to Mailu to send mails which fails seemingly random.

While I have no experience with the Python frameworks used for Mailu, it looks to me, like this issue might be caused by issues with database connection handling (Log: (psycopg2.OperationalError) server closed the connection unexpectedly). Mailu seems to try to use a stale pool connection which is no longer valid and does not handle the exception (see https://docs.sqlalchemy.org/en/20/core/pooling.html#dealing-with-disconnects).

I've checked the PostgreSQL logs and there are no errors at the time the issue occurs. But PG reports "could not receive data from client: Connection reset by peer" quite frequently at other times. I've also verified that the PG pod was not restarted around the time of the error.

Here all relevant log entries:

mailu-postgres-0 | 2024-09-03 03:43:19.966 GMT [268474] LOG:  could not receive data from client: Connection reset by peer
mailu-postgres-0 | 2024-09-03 04:46:41.057 GMT [274079] LOG:  could not receive data from client: Connection reset by peer
mailu-postgres-0 | 2024-09-03 04:51:35.969 GMT [274076] LOG:  could not receive data from client: Connection reset by peer
mailu-postgres-0 | 2024-09-03 04:51:35.969 GMT [274077] LOG:  could not receive data from client: Connection reset by peer
mailu-postgres-0 | 2024-09-03 04:51:35.969 GMT [274078] LOG:  could not receive data from client: Connection reset by peer
mailu-postgres-0 | 2024-09-03 04:53:14.269 GMT [274508] LOG:  could not receive data from client: Connection reset by peer
mailu-admin | [2024-09-03 04:57:50,574] WARNING in nginx: Invalid user '...': (psycopg2.OperationalError) server closed the connection unexpectedly
mailu-front | Sep 03 04:57:52 submission-login: Info: Disconnected: Connection closed (auth failed, 1 attempts in 2 secs): user=<...>, method=LOGIN, rip=..., rport=46260, lip=10.233.116.10, lport=465, TLS: Disconnected
external-app | 2024-09-03T04:57:52.077Z  WARN 1 --- [o-8080-exec-686] Mail could not be sent.
external-app | 
external-app | org.springframework.mail.MailAuthenticationException: Authentication failed
external-app |       at org.springframework.mail.javamail.JavaMailSenderImpl.doSend(JavaMailSenderImpl.java:402)
external-app | Caused by: jakarta.mail.AuthenticationFailedException: 535 5.7.8 Authentication failed.
external-app | 
external-app |       at org.eclipse.angus.mail.smtp.SMTPTransport$Authenticator.authenticate(SMTPTransport.java:954)
mailu-postgres-0 | 2024-09-03 05:56:35.357 GMT [280096] LOG:  could not receive data from client: Connection reset by peer
mailu-postgres-0 | 2024-09-03 06:21:59.069 GMT [281674] LOG:  could not receive data from client: Connection reset by peer

Sep 03 '24 11:09 dgerhardt

@dgerhardt I also reproduced this on newer version :/

Definitely something on server side because this issue is present on all devices including cellphones and computers, I noticed that there are no errors if I'm logging from time to time only on one device, when using with more than one (phone + laptop) i got it almost everyday. Maybe there is something more cached, some kind of token. Still did not had that much time to do more investigation :/

Sep 03 '24 11:09 dominch

@dominch Could you reopen the issue for better visibility?

I noticed that there are no errors if I'm logging from time to time only on one device, when using with more than one (phone + laptop) i got it almost everyday.

Interesting observation. This would also be explainable with a database connection issue if there is an inactivity-related timeout.

Sep 03 '24 11:09 dgerhardt

@dgerhardt probably I have no permission to reopen it :( maybe @fastlorenzo ?

This issue is hard to reproduce on demand, but I'm quite sure that something related to more than one endpoint and maybe some specific log-in time. I'm moving some resources now (including mailu) and planned to find out what is needed to hit this issue. I think that two endpoints are needed and some specific time period, which may be connected with DB timeout You found.

Sep 03 '24 12:09 dominch