mail-api
mail-api copied to clipboard
Non US-ASCII characters in display name parsed incorrectly by InternetAddress(String)
Non US-ASCII characters end up unencoded in encodedDisplay field when constructor InternetAddress(String) is used to parse valid addresses e.g. "тест" <[email protected]>
.
The workaround is to use InternetAddress(String address, String display) or InternetAddress(String address, String display, String charset)
An address that contains non-ASCII characters is not a valid RFC 822 address. The InternetAddress(String) constructor and the parse(String) method require a valid RFC 822 address.
What version of JavaMail are you using? When I use the address above, I get this exception with the latest version of JavaMail:
Exception in thread "main" javax.mail.internet.AddressException: Local address contains control or whitespace in string ``"тест" [email protected]''
Sorry, some of the markup got lost. I updated my comment.
AFAIK RFC 822 address allows non-US ASCII in the display name - it just needs to be encoded when being sent. MimeMessage.addRecipient() works correctly when I add such address - as long as I use InternetAddress(String address, String display) constructor. The display name get garbled (not-encoded) if use InternetAddress(String addressAndDisplayName).
We use JavaMail 1.4.6
Right, non-ASCII has to be encoded for RFC 822, your example is not.
I still get the exception with JavaMail 1.4.6 (which is 6 years old).
$ cat iatest.java
import javax.mail.internet.InternetAddress;
public class iatest {
public static void main(String[] argv) throws Exception {
InternetAddress a = new InternetAddress("\"тест\" [email protected]");
System.out.println(a);
}
}
$ javac iatest.java
$ java iatest
Exception in thread "main" javax.mail.internet.AddressException: Local address contains control or whitespace in string ``"тест" [email protected]''
at javax.mail.internet.InternetAddress.checkAddress(InternetAddress.java:1341)
at javax.mail.internet.InternetAddress.parse(InternetAddress.java:1191)
at javax.mail.internet.InternetAddress.parse(InternetAddress.java:728)
at javax.mail.internet.InternetAddress.<init>(InternetAddress.java:95)
at iatest.main(iatest.java:5)
@iamranger I think you misunderstood the API documentation. Encoding is never performed on the 'address' argument. Only, if present, on the 'personal' argument. Therefore your observation is consistent with the API documentation. The single argument constructor, which accepts an 'address' as an argument, will not perform any encoding.
The single argument constructor takes an RFC 822 formatted address, which can include both an email address and a personal name field. The key is that the argument needs to be formatted per RFC 822, which means it needs to be all-ASCII with any non-ASCII characters encoded in the original string.
I believe this is all working according to the spec. I'll close this issue soon lacking other evidence of a problem.
This is the test I wrote. The two constructors still work differently in 1.6.3 release. Note that there is no exception in either case. The first constructor simply dumps the display name into "encodedDisplay" field unencoded, which leads to incorrect behavior of addRecipient (the display name is sent unencoded). The second constructor works as expected (the non-US-ASCII characters are correctly encoded when the recipient is added). I feel that there is no good reason for the behaviors to be different. The first constructor parses this address correctly - it should just detect that the encoding is needed, and encode the display name - exactly as it is done by the second constructor and setPersonal method.
MimeMessage msg = new MimeMessage(session);
String japaneseDisplayName = "ALMのバックオフ";
String email = "[email protected]";
String fullAddress = "\"" + japaneseDisplayName + "\" <" + email + ">";
// use InternetAddress(String address)
msg.addRecipient(Message.RecipientType.TO, new InternetAddress(fullAddress));
// use InternetAddress(String address, String personal)
msg.addRecipient(Message.RecipientType.TO, new InternetAddress(email, japaneseDisplayName));
String encodedAddress1 = msg.getAllRecipients()[0].toString();
String encodedAddress2 = msg.getAllRecipients()[1].toString();
// should not match, as the display name is expected to be encoded
System.out.println("TEST#1 " + (fullAddress.equals(encodedAddress1) ? "FAILED" : "PASSED"));
// should not match, as the display name is expected to be encoded
System.out.println("TEST#2 " + (fullAddress.equals(encodedAddress2) ? "FAILED" : "PASSED"));
// the addresses created by both constructors should be the same
System.out.println("TEST#3 " + (encodedAddress2.equals(encodedAddress1) ? "PASSED" : "FAILED"));```
The workaround for the issue:
InternetAddress recipient = new InternetAddress(address);
if (recipient.getPersonal() != null) {
recipient = new InternetAddress(recipient.getAddress(), recipient.getPersonal()));
}
Ok, now we're getting somewhere...
If you already have the display name and the email address as separate strings, you should clearly use the constructor that takes them as separate arguments.
The single arg constructor isn't checking for non-ASCII characters and failing, even though the documentation says that the input must be RFC 822 format. Changing the constructor to fail in this case would probably be too large of a compatibility risk. Changing the constructor to encode the non-ASCII characters in this case might be less risky but might still break existing programs. It might be better to add a parseUnicodeString method to handle this case.
How are you encountering these non-encoded strings in your real application? Are you constructing them yourself, as show in your test program? Are you discovering them in email messages? Are you reading them from a database? Are you accepting them as input typed by a user?
Yes, the real application, and the address is this case is entered by a user as a single line. They input is in a correct format (RFC 822) but with international (Japanese) characters in display name. Most email clients do support extended characters in display name, so that's probably was their thinking. The input was then passed by the application, as is, to InternetAddress(String) constructor as a parameter. The customer complained that the display name in the emails their recipients received did not match (to put it mildly) the name they entered. The rest of the info comes from my debugging of this issue. Currently we settled on the workaround above - use InternetAddress(String) to parse the address into parts, then use InternetAddress(addr, displayName) if there is a display name. It does work, so the this is hardly a show stopper - I just thought I'd let people know about it.
I wouldn't normally expect a user to type in an email address in that format since it's very likely that they would get the syntax wrong. Normally I would expect to ask for their real name and their email address separately.