mailcow-dockerized icon indicating copy to clipboard operation
mailcow-dockerized copied to clipboard

Incorrect encoding in non-latin quarantined mails, also after release

Open ValdikSS opened this issue 3 years ago • 47 comments

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

  • [X] I understand that not following or deleting the below instructions will result in immediate closure and/or deletion of my issue.
  • [X] I have understood that this bug report is dedicated for bugs, and not for support-related inquiries.
  • [X] I have understood that answers are voluntary and community-driven, and not commercial support.
  • [X] I have verified that my issue has not been already answered in the past. I also checked previous issues.

Summary

Mailcow commit a832becbd530603710a823be526a9ec4d9f1f89d If the email in Windows-1251 encoding (others may be affected as well) gets quarantined, its text does not show correctly in quarantine web interface, and email remains unreadable after release.

Logs

brokenmails.zip These are two exact emails, one of which is in correct encoding which was exported from junk folder, another is what quarantine release delivered to inbox.

Reproduction

  1. Get quarantined email in Russian, with Windows-1251 encoding
  2. Try to release the email
  3. Receive unreadable email in inbox

Screenshot_20210111_013505-fs8

Unfortunately I no longer can show you a screenshot of quarantine web interface because I learned similar emails as ham and they no longer go to quarantine.

System information

Question Answer
My operating system Linux Ubuntu 20.04
Is Apparmor, SELinux or similar active? Yes, AppArmor. No issues with it in audit logs.
Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported Bare metal
Server/VM specifications (Memory, CPU Cores) 4 cores, 16 GB RAM
Docker Version (docker version) 20.10.1
Docker-Compose Version (docker-compose version) 1.27.4, build 40524192
Reverse proxy (custom solution) Custom configuration, did not touch Mailcow configs, irrelevant
  • Output of git diff origin/master, any other changes to the code? No.
  • All third-party firewalls and custom iptables rules are unsupported. Please check the Docker docs about how to use Docker with your own ruleset. Nevertheless, iptabels output can help us to help you: iptables -L -vn, ip6tables -L -vn, iptables -L -vn -t nat and ip6tables -L -vn -t nat.
  • DNS problems? Please run docker exec -it $(docker ps -qf name=acme-mailcow) dig +short stackoverflow.com @172.22.1.254 (set the IP accordingly, if you changed the internal mailcow network) and post the output.

ValdikSS avatar Jan 10 '21 22:01 ValdikSS

You don't have a db dump anymore, right?

Or any other mail with that problem currently in your quarantine?

andryyy avatar Jan 11 '21 16:01 andryyy

@andryyy I've used password recovery and got the message in quarantine, it's broken. How should I proceed?

Screenshot_20210111_194759-fs8

ValdikSS avatar Jan 11 '21 16:01 ValdikSS

image

Dunno, the mails seem to have encoding problems in general. :/

andryyy avatar Jan 11 '21 16:01 andryyy

Try to receive new post notification. It seems that registration/password reminding letters don't have space between some header name and value, but post notifications have them.

ValdikSS avatar Jan 11 '21 16:01 ValdikSS

Please give me more time for this.

I think the mail encoding is a bit messed up, but I'm not sure yet...

The subject seems to be read as UTF-8 (perhaps?). Not sure.

andryyy avatar Jan 11 '21 18:01 andryyy

Here's the original email, notification of new forum message. The one which is the first post is ruboard в†’ [email protected] (mail.ru) в†’ [email protected] (mailcow). This one is from [email protected] mailbox. As you can see, the subject has encoding and is in Windows-1251, but Content-type header has no space between its name and value: Content-type:text/plain;charset=Windows-1251. Maybe that's an issue.

Message16103458220307880921.zip

Remember password message on the contrary have proper Content-Type: text/plain; charset=Windows-1251 (with space), but no encoding in Subject: Subject: Забыли пароль?.

Message16103835610793656295(1).zip

ValdikSS avatar Jan 11 '21 18:01 ValdikSS

That's a good catch. :) I will check that.

andryyy avatar Jan 11 '21 20:01 andryyy

Here's another broken message, this time from Google Groups. message.zip This message contains strange ÐžÑ symbols in the header, near To field. This is what Google sends for some reason (it persist in older messages as well).

X-BeenThere: [email protected]
Received: by 2002:a1c:2e50:: with SMTP id u77ls1076220wmu.2.canary-gmail; Tue,
 08 Dec 2020 05:02:21 -0800 (PST)
X-Received: by 2002:a05:600c:268b:: with SMTP id 11mr3827005wmt.78.1607432541168;
        Tue, 08 Dec 2020 05:02:21 -0800 (PST)
MIME-Version: 1.0
To: =?UTF-8?B?0JzQvtC00LXRgNCw0YLQvtGA0Ysg0YHQv9Cw0LzQsA==?= <[email protected]>
От: [email protected]
Subject: =?UTF-8?B?W2FjXSDQntGC0YfQtdGCINC80L7QtNC10YDQsNGC0L7RgNCwINC+INGB0L/QsNC80LUg?=
	=?UTF-8?B?0LIg0LPRgNGD0L/Qv9C1IGFudGljZW5zb3JpdHlAZ29vZ2xlZ3JvdXBzLmNvbQ==?=
Message-ID: <[email protected]>
Date: Tue, 08 Dec 2020 13:02:21 +0000
Content-Type: text/plain; charset="UTF-8"

Screenshot_20210112_012530-fs8

ValdikSS avatar Jan 11 '21 22:01 ValdikSS

I only see those content type fails with russian mail. And not even all. One needs to check wether they are correctly encoded/formatted and if we really want to work that around if they are not.

The previous "originals" also messed up my local mail client.

andryyy avatar Jan 12 '21 07:01 andryyy

I work every day with Cyrillic, postfix handle all correctly. This issue just on sender side and don't think there actually must be/can be any fix for sender who send mail with incorrect mime type/encoding from his side.

dragoangel avatar Jan 17 '21 23:01 dragoangel

And when you create contacts in Russian, are they displayed correctly? I have question marks instead of Russian letters. I myself am looking for an answer to this problem. In the demo on the Sogo website, and on mailcow, everything is OK, but in my installation ????? such signs


А у тебя контакты когда на Русском создаешь, корректно отображаются? У меня вопросительные знаки вместо Русских букв. Сам ищу ответ на эту проблему. В демке на сайте Sogo и на mailcow все ок, а вот в моей установке ????? такие знаки

Muwahhidun avatar Mar 01 '21 17:03 Muwahhidun

Do you use an external SQL?

andryyy avatar Mar 01 '21 17:03 andryyy

Do you use an external SQL?

no, I have an official docker compose. 19 containers.

Muwahhidun avatar Mar 01 '21 17:03 Muwahhidun

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Jun 02 '21 19:06 github-actions[bot]

The issue is still not fixed, please reopen. I can provide fresh .eml files.

ValdikSS avatar Jan 01 '22 18:01 ValdikSS

@andryyy, I also can provide database dumps. Not removing the quarantine data yet.

ValdikSS avatar Jan 01 '22 18:01 ValdikSS

That would be great. Can you mail to @.*** ?

I will need some time though as I’m currently in hospital.

Am 01.01.2022 um 19:57 schrieb ValdikSS @.***>:

 @andryyy, I also can provide database dumps. Not removing the quarantine data yet.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

andryyy avatar Jan 01 '22 21:01 andryyy

The email is not shown. Please mail me at [email protected], I'll mail you back.

ValdikSS avatar Jan 01 '22 21:01 ValdikSS

[email protected]

andryyy avatar Jan 01 '22 22:01 andryyy

If these errors only happen with the same wrongly encoded mails from your previously sent items I will not work on it. The sender will need to fix their issues then as it was stated before.

I don't think we are responsible to fix that. :/

Drago works with Russian mail all the time. It is fine for him. Your example mail was totally broke.

andryyy avatar Jan 01 '22 22:01 andryyy

No, this time it's a Google Groups email. And others, I need to check.

ValdikSS avatar Jan 01 '22 22:01 ValdikSS

Please carry me on what I should do. Right now the email looks like this: Screenshot 2022-01-02 at 01-28-02 mailcow UI

ValdikSS avatar Jan 01 '22 22:01 ValdikSS

Haha, this email has Russian "От:" in the email header instead of "From:".

ValdikSS avatar Jan 01 '22 22:01 ValdikSS

Haha, this email has Russian "От:" in the email header instead of "From:".

(facepalm) omg 😱

dragoangel avatar Jan 02 '22 09:01 dragoangel

So right now there are two issues with Mailcow:

  1. The quarantine system breaks the headers on the first non-7-bit-ascii symbol and not on \r\n\r\n
  2. The message is re-encoded when entering quarantine and when released, that's why broken mails are released broken after quarantine.

For 1) mailcow should split headers from the body by searching \r\n\r\n, and for the 2) mailcow should not assume encoding and treat emails as a sequence of bytes, at least for releasing.

ValdikSS avatar Jan 02 '22 10:01 ValdikSS

We use a very popular mail parser. I think your mails are a bit off.

Mit besten Grüßen André Peters

Am 02.01.2022 um 11:45 schrieb ValdikSS @.***>:

 So right now there are two issues with Mailcow:

The quarantine system breaks the headers on the first non-8-bit-ascii symbol and not on \r\n\r\n The message is re-encoded when entering quarantine and when released, that's why broken mails are released broken after quarantine. For 1) mailcow should split headers from the body by searching \r\n\r\n, and for the 2) mailcow should not assume encoding and treat emails as a sequence of bytes, at least for releasing.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.

andryyy avatar Jan 02 '22 11:01 andryyy

We use a very popular mail parser. I think your mails are a bit off.

Sure they are, but this shows that even such monsters as Google could make program error and include translated string into the headers.
It'll be handy to have a more loyal parser for more compatibility with broken emails.

ValdikSS avatar Jan 03 '22 21:01 ValdikSS

Not sure mail header could be at all written on non-latin. I really not see how it should be parsed, by rspamd as well. If you sql dump this email can you send it to me? In telegram for example, I can't fix it, but wanted to look. I never faced such emails.

dragoangel avatar Jan 03 '22 21:01 dragoangel

Quite old email, they even signed "От" in dkim... Rspand can't get it as well

dragoangel avatar Jan 03 '22 21:01 dragoangel