WIP EAI (SMTPUTF8 for SMTP, UTF8=ACCEPT for IMAP)
Hi,
this is a WIP patch to support EAI. Some remarks:
- Dovecot generally accepts just-send-8 and passes it on. This does the same, because IMO a patch should agree with the project. Therefore this does no downgrading or anything like that.
- I haven't implemented the UTF8 syntax for APPEND. The parser in cmd-append.c confused me and now a high-priority task prevents me from spending more time on this bit. This definitely should be done.
- I seem to remember that Dovecot had a nice automatic test suite back in hg days, is that in another repo?
- LMTP always accepts EAI messages, Submission accepts them if and only if the upstream relay does.
- Dovecot has some clever logic to choose format for strings it emits via IMAP. This doesn't extend that logic, which doesn't harm correctness but means that Dovecot may send something via literal that could be a quoted-string.
I'd appreciate comments.
Thanks, we'll take a look.
Hi, one big issue with this code is that it mainly just makes dovecot accept UTF8, yet it does very little in the way of actually handling it. It also does not deal with unicode normalizations required for recipient handling.
So in short, while we do really appreciate the effort, we would need this to actually take care of the utf-8 we are accepting, and not just wish for the best and store it, which seems to be missing from your TODO list.
Thanks for your quick response.
The patch doesn't do anything about the UTF8 because RFCs 589x, 6531 and 6855 demand nothing of a server such as Dovecot. To name three examples:
- The RFCs permit fields such as subject to contain UTF8 instead of 2047-encoded UTF8, but do not change anything about the content. Only the encoding of the header field is changed, and I believe Dovecot already handled it since it handles just-send-8 so well.
- The RFCs require normalisation on user input, which matters for webmail systems but not for a Submission server. Again, Dovecot escapes untouched. You could also say that MTAs that route mail should normalise, but Dovecot doesn't do that either. (Note that this patch doesn't change the set of destination addresses supported by LMTP.)
- I'm not sure what RFC 9051 says about e.g. search. Perhaps a server that supports unicode content should normalise. But RFC 6855 says nothing about that, so normalisation is not relevant to an implementation of RFC 6855. 6855 simply changes the search syntax, it doesn't change the rules for executing the search at all. It's good if a search for "grå" match 0067 0072 0061 030A as well as 0067 0072 00E5, but that applies to the unicode bodies Dovecot already supports just as much as to addresses.
Actually, let's do it differently. Why don't you just make some imaptest tests that fail, and I'll make them pass. That'll explain concisely what you have in mind. Does that sound good to you?
One more question. This PR's goal ist limited in scope to EAI support like gmail's: Users can receive mail from and send mail to grå@grå.org and deal with that mail as with all mail about grå, It does not aim to host a domain such as grå.org itself. Is that an acceptable scope to you?
I could do a separate PR to support hosting if you want to, but I'd really prefer that to be a separate PR.
The problem is mostly that if i send email to ℌdž@domain.com, it should go to hdž@domain.com. If there is no normalization done, these two are considered different user. This is handled by https://www.rfc-editor.org/rfc/rfc8265, which applies to sender/recipient names too.
That said, this also needs to handle other headers as well. This is governed by https://www.rfc-editor.org/rfc/rfc6532. Before we start coming up with examples, why not take a moment to read these?
I know both of them. I should ;)
I believe Dovecot escapes the requirements in 8265 at present. (Being able to host grå.org would change this.) 8265 applies to software that accepts addresses from outside, does some sort of comparison and then does something differently based on the result of the comparison. Dovecot as it stands accepts addresses, but does nothing differently based on the result. For example, the submission relays all addresses to the backend server.
6532 changes a number of limits and lifts a number of restrictions that matter to this PR, but AFAICT Dovecot already lifted those long ago, so no changes were necessary.
6532 has implications for DSN generators. As far as I can tell, none of the code I touched will generate DSNs, but the Sieve code will. Are you saying that you'd require Sieve to be updated as well?
Well we derive the username from the destination address, so at minimum that has to work, as well as SEARCH FROM some@address.
We can consider doing this in increments, as long as we clearly refuse stuff we cannot handle correctly.
Oh and sieve is these days pretty wedded to dovecot, so it should not break either.
I could make a followup PR to support local users with UTF8 names. The goal of this PR is more restricted (the same level of support as gmail currently has — [email protected] can send mail to foo@grå.org, but there cannot be a grå@gmail.com).
It would be much easier for me to get management approval for more work if gmail-level support is merged. The followup would then add support for local users with EAI addresses and for Sieve.
This PR is a WIP and I can put more W into it, but I can't put unlimited work into it without an assurance that the work will be merged, see?
ok. I'll have to discuss this internally then.
At minimum SEARCH FROM has to work. Even if we only accept utf8 senders.
I absolutely agree that SEARCH FROM has to work. (It's a bit tricky, e.g. if the message contains a DKIM or PGP signature over the bytes as received.) How do you prefer automated tests in PRs such as this?
we have internal ci with tests (provide some shell script / python script) and unit tests if possible. imaptest script is ok too, although not sure if it can test this, esp. the problem cases.
Could you possibly link to an example in the style you most prefer?
we'll have to adapt the ci test in any case.
Sadly, RFC 8625 is required even in the gmail-level support I had in mind for this PR. Sieve tests such as address "from" :is :all require it.
Hi,
I wrote automatic tests for this (and did a little more work on the PR too).
I believe that the optimal way to support EAI in Dovecot is with three PRs, this is one of them and logically the first.
- This PR, which is sufficient to converse with non-ASCII addresses. Much of what's needed already worked. For example, I have automatic tests that show that unicode normalisation already works as necessary for IMAP SEARCH and S/MIME. That already worked, hence there's no code for that in the PR.
- A PR for pigeonhole, which needs to support tests such as
address "from" :is :all, autoreplies, forwarding, vacation etc. - A second PR for dovecot/core, to support hosting non-ASCII addresses.
I can write the second and third, but I need this merged, or an assurance that it will be merged.
Thanks. We'll take a look and let you know.
@cmouse Hi. Any updates on your decisions?
I suppose before the SMTPUTF8 support in dovecot, it is best to disable SMTPUTF8 in postfix?
smtputf8_enable=no
@HLFH Yes, you need to disable smtputf8 in postix. @arnt we are still looking at this, and we are maybe leaning into incorporating libunistring & libidna2 to do the unicode work for us. This would make some of the problem cases go away, especially the normalizations needed to make header searches work. Still, I can run your patch through our CI to see what it makes of it and if it spots any bigger issues.
@arnt it seems to pass current tests so that's at least good. There were some boolean issues, but we can take care of those if we merge this.
You don't really need to disable SMTPUTF8 in postfix — senders get the same error message in both cases. Disabling and enabling both have advantages, and both are really small advantages ;)
To whom should I send the test? UID SEARCH for the same word normalised and denormalised gave the same result, so something or other already handles normalisation.
I updated the PR so that adding a message with From: [email protected] now returns grå.org in the envelope and search from "grä" matches it.
Hi, we'll take a look.
Moved as internal merge request.
Merged as https://github.com/dovecot/core/compare/eabb4115a76f4ba3beb193f4fdb2f484bdf4da48^...8ea0933c8e4cbe8da2f47e91d47c72265037dfbb after some rounds of changed. This code requires to be complied with --enable-experimental-mail-utf8.
@arnt Already merged, but now looked at this a bit more. Was there a reason MAILBOX_FEATURE_UTF8ACCEPT was added? Is it useful in some fuller implementation that doesn't exist yet? Since the current code otherwise could have been only inside imap-specific code, but this touched lib-storage as well.
You could say it relates to a fuller implementation… I also wrote code for read-time downgrading, but did not include that in the patch due to a bug I couldn't see how to solve. The bug can be shifted here or there.
If you do read-time downgrading and know from a storage flag that a particular mailbox contains no UTF8 addresses, then fetching BODYSTRUCTURE doesn''t need to do slow downgrading. This was a common case and mattered for performance. But I couldn't get the storage flag to be reliable.
I now think that it's perhaps best to remove MAILBOX_FEATURE_UTF8ACCEPT and regard it as a FETCH performance problems rather than a flag reliability prblem.
I've a feeling read-time downgrading isn't very useful compared to just sending the UTF8 email to clients, even if they don't support UTF8. My guess is most clients would work just as well, or maybe even better.