smtp4dev
smtp4dev copied to clipboard
[Bug] The View tab doesn't show latin characters when the charset of the message is iso-8859-1
When I send a message with latin characters from a file with encoding iso-8859-1 (Western/Windows1252), the "View" tab doesn't show those characters:
My file:
smtp4dev View tab:
But the Parts > Source tab does show them correctly:
The charset of the message:
Tech specs
- smtp4dev version: 3.1.0-ci0752
- Browser: Google Chrome Version 76.0.3809.100 (Official Build) (64-bit)
Thanks for reporting this issue and include enough details to repro.
Hi @cheoAlejo
First a disclaimer, I am quite green with internationalization and it appears to be a minefield.
Anyhow, trying to research this and came across https://stackoverflow.com/questions/25710599/content-transfer-encoding-7bit-or-8-bit
I may of taken this out of context, but are they suggesting 8bit encoding transfer is actually not legal (not recommended) on the internet?
As of the publication of this document, there are no standardized Internet transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies. Thus there are no circumstances in which the "8bit" or "binary" Content-Transfer-Encoding is actually legal on the Internet.
Is there any way you can set Body transfer encoding to QuotedPintable
instead of 8Bit
?
I did up a test scenario, and when I change encoding to QuotedPrintable it is shown correctly.
Test Setup used
var smtpClient = new SmtpClient("localhost")
{
Port = 25,
};
var mailMessage = new MailMessage
{
From = new MailAddress("[email protected]"),
Subject = "Latin test",
BodyTransferEncoding = TransferEncoding.QuotedPrintable,
Body =
"<span>Homines in indicaverunt nam purus quáestionem sentiri unum. Afflueret contentus diam errore faciam, honoris mucius omnem pélléntésqué reiciendis. Acuti admissum arbitrantur concederetur dediti, ferrentur fugiendus inferiorem peccant ponti quando solam ullius. áb atilii concursio constituamus, définitioném diligenter graeci illam máius operis opinionum pótióne versatur. Alliciat aspernari consoletur disserunt, impendere interiret reliquarum verum. Convállis essent foedus gravida iustioribus, mox notissima perpaulum praeclare probatum, prohiberet sensibus. Condimentum efficeretur iis insipientiam, inutile logikh ne ornare, paulo primis primo pugnare putarent quiddam reperiuntur. \r\nCéramico cónsistat éiusdém licet offendimur, recusandae referendá. Cupiditatés hónesta musicis possent, respondendum sollicitudines. Breviter democrito dolor electram illa, ludicra non occulta pérféréndis principio servare suum tranquillitatem. Consentinis probatus qualisque tollatur veritatis. In inséquitur ortum pertinaces, sentit stoici sum téréntii.</span>",
BodyEncoding = Encoding.Latin1,
IsBodyHtml = true,
};
mailMessage.To.Add("[email protected]");
smtpClient.Send(mailMessage);
This issue is preventing us from using this awesome tool for a while now. Is there any temporary workaround for this ?
@anuj2nt are you able to supply a test client that creates the email? PHP,js or c#? Happy to investigate further just after some sample code to produce the email?
I have this issue as well. Could basically use same test data as @cheoAlejo but with text/html;charset=windows-1252.
@jafin According to the stackoverflow post, it is not legal according to the over 20 years old RFC 1341. But since then 8bit MIME Extension in RFC 6152 has been added which supports non-ASCII characters. So support for 8bit MIME Extension would have been greatly appreciated. I am currently running tests from a COTS product and do not have access to modify the content-transfer-encoding value.
@rnwood Are there any plans to include this feature in the near future?
Repro:
using System.Net.Mail;
using System.Net.Mime;
using System.Text;
var smtpClient = new SmtpClient("localhost")
{
Port = 25,
};
var mailMessage = new MailMessage
{
From = new MailAddress("[email protected]"),
Subject = "Latin test",
//BodyTransferEncoding = TransferEncoding.,
Body =
"<span>Homines in indicaverunt nam purus quáestionem sentiri unum. Afflueret contentus diam errore faciam, honoris mucius omnem pélléntésqué reiciendis. Acuti admissum arbitrantur concederetur dediti, ferrentur fugiendus inferiorem peccant ponti quando solam ullius. áb atilii concursio constituamus, définitioném diligenter graeci illam máius operis opinionum pótióne versatur. Alliciat aspernari consoletur disserunt, impendere interiret reliquarum verum. Convállis essent foedus gravida iustioribus, mox notissima perpaulum praeclare probatum, prohiberet sensibus. Condimentum efficeretur iis insipientiam, inutile logikh ne ornare, paulo primis primo pugnare putarent quiddam reperiuntur. \r\nCéramico cónsistat éiusdém licet offendimur, recusandae referendá. Cupiditatés hónesta musicis possent, respondendum sollicitudines. Breviter democrito dolor electram illa, ludicra non occulta pérféréndis principio servare suum tranquillitatem. Consentinis probatus qualisque tollatur veritatis. In inséquitur ortum pertinaces, sentit stoici sum téréntii.</span>",
BodyEncoding = Encoding.Latin1,
BodyTransferEncoding = TransferEncoding.EightBit,
IsBodyHtml = true,
};
mailMessage.To.Add("[email protected]");
smtpClient.Send(mailMessage);
Unfortunately I fear this is a very complex issue that lies in Rnwood.Smtp4dev server. SMTP4DEV does support 8BITMIME extension and UTF8 but the client is not using it in this case. What is happening is SmtpClient is doing "just-send-8bit" and is encoding the body using whatever encoding is specified. Unfortunately I don't think we're handling that correctly and the body ends up re-encoded as UTF-8. Then when the UI later displays the message, it is reading the Mime-Type header from the body and decoding it using that encoding., but actually it's not encoded like that any more. This is what causes the broken characters.
In the linked PR, I have applied a workaround whilst I think about how to resolve this. The main issue I have is that I'm not sure what is the correct (or most common) behaviour that should be implemented for the "just send 8" case is - What encoding should be used for the body or should it just be treated as a stream of bytes? (this would be a major change to resolve).
The workaround simply avoids this re-encoding in the UI and treats it at UTF8 - which it is. This is not the correct fix though. A build should be generated which you can use to confirm but please note that I don't intend to merge this.
Further advice if you are seeing this bug. Try make sure your client is using the 8BITMIME extension, which forces UTF8. This should avoid the issue. SMTP4DEV is probably not the only smtp/client software with an issue like this given how undefined/unclear it is.
PR #1344
Build 3.3.3-ci20240306100 should now be available.
https://dev.azure.com/rnwood/smtp4dev/_build/results?buildId=2229&view=artifacts&pathAsName=false&type=publishedArtifacts
Pleased to confirm that I have created https://github.com/rnwood/smtpserver/pull/173 which addresses this issue in the server component. The message will now not be decoded and re-encoded with the wrong encoding in this "just send 8 bit" scenario.
The resulting build then needs to be picked up by Smtp4dev.
The PR has now been updated with what I think is a quite complete fix. The source and raw tabs have also been adjusted to detect the encoding from the relevant MIME part and transcode to UTF8 for display.
Please note, existing message may display strangely since those have already been transcoded incorrectly.
PR will be merged. Feedback invited for >= 3.3.3-ci20240309103
Thanks for the quick update! Tested in [3.3.3-ci20240309104]. However, when testing, the characters still won't display correctly in. They are still displayed as in the raw output shown below (the subject line does show åäö correctly though):
From:[email protected] Reply-To:[email protected] Subject:=?UTF-8?B?VGVzdMOlw6TDtg==?= date: lör 09 mar 2024 20:30 +0000 To:casi MIME-Version:1.0 Content-type:multipart/alternative;boundary="----_=_alt_boundary_1_1710016257"
------_=_alt_boundary_1_1710016257 Content-type:text/plain;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline
Overview: Current Task: New Task 1 Process Name: Mailtest : 1234/A;1-Test Due Date: None Comments: (none) Instructions: JadÃÂ¥, här testas bÃÂ¥de ÃÂ¥ ä och ö. Hoppas att det funkar dÃÂ¥! àé à""&
------_=_alt_boundary_1_1710016257 Content-type:text/html;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline
<!DOCTYPE html>
<html>
<head>
<style>
.tableBordered{
border:1px solid Black;
border-collapse:collapse;
padding:2px;
}
</style>
</head>
<body style="font-family:arial">
<div style="color:#448da6; font-weight:bold; margin-bottom:3px;">Overview:</div>
<table style="font-family:arial">
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Current Task: </td>
<td style="vertical-align: top; text-align:left;">New Task 1 </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Process Name: </td>
<td style="vertical-align: top; text-align:left;">Mailtest : 1234/A;1-Test </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Due Date: </td>
<td style="vertical-align: top; text-align:left;">None </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Comments: </td>
<td style="vertical-align: top; text-align:left;">(none) </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Instructions: </td>
<td style="vertical-align: top; text-align:left;">JadÃÂ¥, här testas bÃÂ¥de ÃÂ¥ ä och ö. Hoppas att det funkar dÃÂ¥! àé à""& </td>
</tr>
</table>
</body>
</html>
------_=_alt_boundary_1_1710016257--
@ballaballaballa Can you share the correct text for the body?
I've tested åäö with a variety of encodings including just-send-8-bit iso-8859-1 and I can't reproduce it.
If we take this word as an example from what you are seeing:
Jadå
In ISO-8859-1 (which is what the part claims to be encoded as), this is encoded as
à => 0xc3 ¥ => 0xa5
But if we look at UTF-8, 0xc3 0xa5 is å
It should be 0xe5 for ISO-8859-1. So actually, the content of this message is UTF-8 I believe. We need to determine if this is the original client encoding it incorrectly, or if smtp4dev is still doing something wrong.
If you can still reproduce this, would you be able to get a Wireshark trace of the session? This will unambigously show what's going on. Unfortunately, the session log in smtp4dev is text based and we can't see how the chars were encoded as bytes.
The weird thing is that the Source tab displayed the characters correctly before your fix, while the View tab displayed them same as they do now. I'll try and get a trace.
So I ran another test and this is the output in raw format. However, when checking in Wireshark, it differs a bit. The line from smtp4dev raw output "date: mÃ¥n 11 mar 2024 19:58 +0000" is in Wireshark output written as "date: m�n 11 mar 2024 19:58 +0000\r\n". So there smtp4dev shows the correct one. The line "Process Name: !mailtest : 123/A;1-TestÃ¥äö" has output "Process Name: !mailtest : 123/A;1-Teståäö\r\n" so there the Wireshark output is the correct one. Same goes for everywhere else where "åäö" is written.
Raw output from smtp4dev: From:[email protected] Reply-To:[email protected] Subject:=?UTF-8?B?dGVzdCDDpcOkw7Y=?= date: mån 11 mar 2024 19:58 +0000 To:[email protected] MIME-Version:1.0 Content-type:multipart/alternative;boundary="----_=_alt_boundary_1_1710187095"
------_=_alt_boundary_1_1710187095 Content-type:text/plain;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline
Overview: Current Task: New Task 1 Process Name: !mailtest : 123/A;1-TestÃ¥äö Due Date: None Email From: admin
Comments: comment Ã¥äö Instructions: (none)
Attachment: Name Type
This email was sent from Teamcenter.
------_=_alt_boundary_1_1710187095 Content-type:text/html;charset=ISO-8859-1 Content-transfer-encoding:8bit Content-Disposition:inline
<!DOCTYPE html>
<html>
<head>
<style>
.tableBordered{
border:1px solid Black;
border-collapse:collapse;
padding:2px;
}
</style>
</head>
<body style="font-family:arial">
<div style="color:#448da6; font-weight:bold; margin-bottom:3px;">Overview:</div>
<table style="font-family:arial">
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Current Task: </td>
<td style="vertical-align: top; text-align:left;">New Task 1 </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Process Name: </td>
<td style="vertical-align: top; text-align:left;">!mailtest : 123/A;1-TestÃ¥äö </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Due Date: </td>
<td style="vertical-align: top; text-align:left;">None </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Email From: </td>
<td style="vertical-align: top; text-align:left;">admin (casi) </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Comments: </td>
<td style="vertical-align: top; text-align:left;">comment Ã¥äö </td>
</tr>
<tr>
<td style="vertical-align: top; text-align:left; color:#808080;">Instructions: </td>
<td style="vertical-align: top; text-align:left;">(none) </td>
</tr>
</table>
<br>
<br>
<div style="font-weight:bold; color:#808080;">
This email was sent from Teamcenter.</div>
</body>
</html>
------_=_alt_boundary_1_1710187095--
@ballaballaballa I'm pretty sure that this shows that your client is sending UTF-8 encoded content but declaring it as ISO-8859-1. I believe Wireshark is assuming UTF-8.
Can you select one of the non ASCII chars in Wireshark and see how they have been encoded as bytes. This will confirm it one way or the other.
I'm tempted to add binary session log to smtp4dev to help with complex issues like this.
Yes, you are correct. I copied the line as hex and it only shows correctly when converted to utf-8. Here is the line in binary: 00110011001011110100000100111011001100010010110101010100011001010111001101110100110000111010010111000011101001001100001110110110 that shows åäö correctly when converted to utf-8. I will write a bug report to their support. Thanks for the help!
I guess though that Outlook and other clients has support to handle this even though the data is incorrect.
Thanks for the confirmation. Closing this issue now.