angus-mail
angus-mail copied to clipboard
IMAPProtocol: Problem with searching emails when the subject contains umlauts
Some Background We are searching in the IMAP-Server for specific Emails based on their Subject. We noticed in one of our tests that the search with umlauts (ü or ä) in the subject is not performant as it should be. It takes 30 Minuten to find 1 Email in the Inbox with > 6000 Emails.
We investigated the problem and found, that in this case IMAP-Server throws an Exception and the Library falls down to the default implementation and loads all Emails.
Details We debugged the code and found the root-cause of the error.
Method search
in the IMAPProtocol
class https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2494 has the following code:
// Check if the search "text" terms contain only ASCII chars,
// or if utf8 support has been enabled (in which case CHARSET
// is not allowed; see RFC 6855, section 3, last paragraph)
if (supportsUtf8() || SearchSequence.isAscii(term)) {
try {
return issueSearch(msgSequence, term, null);
} catch (IOException ioex) { /* will not happen */ }
}
Out IMAP-Server Supports UTF-8 and the code correctly calls issueSearch
with no Charset. So far so good
The problem occurs in the issueSearch
itself on line 2552 https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2552
Here all SearchTerm
s will be converted to the Argument
// Generate a search-sequence with the given charset
Argument args = getSearchSequence().generateSequence(term,
charset == null ? null :
MimeUtility.javaCharset(charset)
In our case the charset
is NULL and then the subject from the SearchTerm will be converted as follows:
public Argument writeString(String s, String charset) throws UnsupportedEncodingException {
if (charset == null) {
this.writeString(s);
} else {
this.items.add(new AString(s.getBytes(charset)));
}
return this;
}
at the end ASCIIUtility.getBytes(s)
will be called and it uses a default OS-Charset (on Windows it is not UTF-8) and at this point of time all umlaut have a wrong representation in the byte-array, which will be sent to the IMAP-Server.
We strongly believe that there should be a possibility to specify the Encoding for converting SearchTerms independently. Or maybe you can find more elegant solution.
Thanks in advance.
I wonder if the bug is here:
// Generate a search-sequence with the given charset
Argument args = getSearchSequence().generateSequence(term,
charset == null ? null :
MimeUtility.javaCharset(charset)
When the charset is null instead of unconditionally passing null it should pass "UTF-8"
when supportsUtf8() is true otherwise null
Looks like my suggestion is the same as: https://github.com/jakartaee/mail-api/issues/474