Deliverability problems with emails coming from microsoft [+fix/workaround in v0.0.13]
Incoming delivery attempts from microsoft are aborted (by them) during SMTP STARTTLS. Current workaround is disabling session tickets in our TLS config. Commit upcoming, more details will follow.
See https://www.xmox.nl/b/#22c8911bf3f768931d93f599b0eb03882d1c78e3 for a binary with the workaround.
Thanks to @mdavids for reporting.
Incoming deliveries were aborted by microsoft's smtp client during starttls. I have an @outlook.com account and an office365 account, and testing showed messages did not come through. I am not getting DSNs or have access to error logs, but mdavids got an error "451 4.4.0 Security status Renegotiate". An initial hunch that renegotiation was in play (which the Go TLS stack doesn't implement) didn't lead to a solution. The connections turned out to be TLS1.3, which doesn't support renegotiation at all.
After some added TLS debugging and testing, the workaround is to disable session tickets in the Go TLS config for the server. I started receiving TLS reports from microsoft with "validation failure" errors (no further details) on Oct 24. They were about MTA-STS, which I checked and couldn't find any issues. (One limitation of TLS reporting is that there's no way to classify an error as just "TLS error". It can only be classified as a TLSA (DANE) error or an MTA-STS error, which can put you on the wrong track when analyzing problems). Anyway, after looking at the TLS messages being exchanged, I noticed the "session ticket message" the Go TLS library sends immediately after its "server finished" message. The TLS1.3 RFC seems to say that such messages can only be sent after having read a "client finished" message, https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.1. Maybe Microsoft recently updated its TLS stack to be more strict, rejecting connections with messages coming in at a moment when not allowed. Probably something more is going on, I would expect the impact to be larger and more well-known.
I have TLS reports without failures from Microsoft on Oct 23, so that points to changes on 24th. Support for incoming DANE was announced at Oct 28: https://techcommunity.microsoft.com/blog/exchange/announcing-general-availability-of-inbound-smtp-dane-with-dnssec-for-exchange-on/4281292. It's for their incoming, not outgoing email. But any update may have come with more tls stack changes.
For completeness: The tcp connection is aborted by remote during the TLS handshake, after mox has written the "server finished" and "sessionticket" messages. Remote doesn't write any bytes in response (no tls alert or client finished message).
And TLS reports. Still good on Oct 23:
{
"organization-name": "Microsoft Corporation",
"date-range": {
"start-datetime": "2024-10-23T00:00:00Z",
"end-datetime": "2024-10-23T23:59:59Z"
},
"contact-info": "[email protected]",
"report-id": "133742711803146449+ueber.net",
"policies": [
{
"policy": {
"policy-type": "tlsa",
"policy-string": [
"[\"3 1 1 5C046FF012891B5F0D6176024C5C25FF486A7C12B8000FDF8B418AB3ECF6D309\",\"3 1 1 CEC87FB33D2A7499CA78E824E59B77531AC1FDEC7378FC81FCE7E5D213A364AB\"]"
],
"policy-domain": "ueber.net"
},
"summary": {
"total-successful-session-count": 7,
"total-failure-session-count": 0
}
},
{
"policy": {
"policy-type": "sts",
"policy-string": [
"version: STSv1",
"mode: enforce",
"mx: mail.axillis.nl",
"max_age: 604800"
],
"policy-domain": "ueber.net"
},
"summary": {
"total-successful-session-count": 7,
"total-failure-session-count": 0
}
}
]
}
Errors on Oct 24:
{
"organization-name": "Microsoft Corporation",
"date-range": {
"start-datetime": "2024-10-24T00:00:00Z",
"end-datetime": "2024-10-24T23:59:59Z"
},
"contact-info": "[email protected]",
"report-id": "133743605086584199+ueber.net",
"policies": [
{
"policy": {
"policy-type": "tlsa",
"policy-string": [
"[\"3 1 1 5C046FF012891B5F0D6176024C5C25FF486A7C12B8000FDF8B418AB3ECF6D309\",\"3 1 1 CEC87FB33D2A7499CA78E824E59B77531AC1FDEC7378FC81FCE7E5D213A364AB\"]"
],
"policy-domain": "ueber.net"
},
"summary": {
"total-successful-session-count": 2,
"total-failure-session-count": 0
}
},
{
"policy": {
"policy-type": "sts",
"policy-string": [
"version: STSv1",
"mode: enforce",
"mx: mail.axillis.nl",
"max_age: 604800"
],
"policy-domain": "ueber.net"
},
"summary": {
"total-successful-session-count": 2,
"total-failure-session-count": 6
},
"failure-details": [
{
"result-type": "validation-failure",
"receiving-mx-hostname": "mail.axillis.nl",
"receiving-ip": "2a02:2770:12:0:21a:4aff:fee0:fa47",
"failed-session-count": 3,
"failure-reason-code": "VALIDATION_FAILURE"
},
{
"result-type": "validation-failure",
"receiving-mx-hostname": "mail.axillis.nl",
"receiving-ip": "84.22.96.237",
"failed-session-count": 3,
"failure-reason-code": "VALIDATION_FAILURE"
}
]
}
]
}
And the logging of a failing session:
Nov 01 13:19:56 mail mox[260590]: l=info m="new connection" pkg=smtpserver remote=52.101.43.122:46292 local=84.22.96.237:25 submission=false tls=false listener=public cid=192618eb71c delta="124.654µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="LS: 220 mail.axillis.nl ESMTP mox v0.0.12\r\n" pkg=smtpserver cid=192618eb71c delta="49.665µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="RC: EHLO SJ2PR03CU001.outbound.protection.outlook.com\r\n" pkg=smtpserver cid=192618eb71c delta=144.37905ms
Nov 01 13:19:56 mail mox[260590]: l=debug m="smtp command result" pkg=smtpserver kind=smtp cmd=ehlo code=250 ecode= duration="39.779µs" cid=192618eb71c delta="120.851µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="LS: 250-mail.axillis.nl\r\n250-PIPELINING\r\n250-SIZE 104857600\r\n250-STARTTLS\r\n250-ENHANCEDSTATUSCODES\r\n250-8BITMIME\r\n250-LIMITS RCPTMAX=1000\r\n250 SMTPUTF8\r\n" pkg=smtpserver cid=192618eb71c delta="26.945µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="RC: STARTTLS\r\n" pkg=smtpserver cid=192618eb71c delta=144.395457ms
Nov 01 13:19:56 mail mox[260590]: l=debug m="smtp command result" pkg=smtpserver kind=smtp cmd=starttls code=220 ecode=2.0.0 duration="515.3µs" cid=192618eb71c delta="625.824µs"
Nov 01 13:19:56 mail mox[260590]: l=trace m="LS: 220 2.0.0 go! (H0zC1Oe0Fiok3krEZ63f3w)\r\n" pkg=smtpserver cid=192618eb71c delta="23.203µs"
Nov 01 13:19:56 mail mox[260590]: l=debug m="starting tls server handshake" pkg=smtpserver cid=192618eb71c delta="551.909µs"
Nov 01 13:19:57 mail mox[260590]: l=info m="connection closed" err="starttls handshake: EOF (io error)" pkg=smtpserver cid=192618eb71c delta=295.626141ms
See golang/go#70232
The fix is in mox release v0.0.13, released just now.
For the record: https://list.mailop.org/private/mailop/2024-November/029764.html