angus-mail icon indicating copy to clipboard operation
angus-mail copied to clipboard

IMAPProtocol: Problem with searching emails when the subject contains umlauts

Open vitaliiavdiienko opened this issue 1 year ago • 2 comments

Some Background We are searching in the IMAP-Server for specific Emails based on their Subject. We noticed in one of our tests that the search with umlauts (ü or ä) in the subject is not performant as it should be. It takes 30 Minuten to find 1 Email in the Inbox with > 6000 Emails.

We investigated the problem and found, that in this case IMAP-Server throws an Exception and the Library falls down to the default implementation and loads all Emails.

Details We debugged the code and found the root-cause of the error.

Method search in the IMAPProtocol class https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2494 has the following code:

// Check if the search "text" terms contain only ASCII chars,
        // or if utf8 support has been enabled (in which case CHARSET
        // is not allowed; see RFC 6855, section 3, last paragraph)
        if (supportsUtf8() || SearchSequence.isAscii(term)) {
            try {
                return issueSearch(msgSequence, term, null);
            } catch (IOException ioex) { /* will not happen */ }
        }

Out IMAP-Server Supports UTF-8 and the code correctly calls issueSearch with no Charset. So far so good The problem occurs in the issueSearch itself on line 2552 https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2552

Here all SearchTerms will be converted to the Argument

// Generate a search-sequence with the given charset
        Argument args = getSearchSequence().generateSequence(term,
                charset == null ? null :
                        MimeUtility.javaCharset(charset)

In our case the charset is NULL and then the subject from the SearchTerm will be converted as follows:

public Argument writeString(String s, String charset) throws UnsupportedEncodingException {
        if (charset == null) {
            this.writeString(s);
        } else {
            this.items.add(new AString(s.getBytes(charset)));
        }

        return this;
    }

at the end ASCIIUtility.getBytes(s) will be called and it uses a default OS-Charset (on Windows it is not UTF-8) and at this point of time all umlaut have a wrong representation in the byte-array, which will be sent to the IMAP-Server.

We strongly believe that there should be a possibility to specify the Encoding for converting SearchTerms independently. Or maybe you can find more elegant solution.

Thanks in advance.

vitaliiavdiienko avatar Jul 31 '23 14:07 vitaliiavdiienko

I wonder if the bug is here:

// Generate a search-sequence with the given charset
        Argument args = getSearchSequence().generateSequence(term,
                charset == null ? null :
                        MimeUtility.javaCharset(charset)

When the charset is null instead of unconditionally passing null it should pass "UTF-8" when supportsUtf8() is true otherwise null

jmehrens avatar Aug 01 '23 15:08 jmehrens

Looks like my suggestion is the same as: https://github.com/jakartaee/mail-api/issues/474

jmehrens avatar Aug 23 '23 02:08 jmehrens