mail-mime-parser icon indicating copy to clipboard operation
mail-mime-parser copied to clipboard

High CPU with email with specific encoding

Open smariussorin opened this issue 1 year ago • 3 comments

Hello,

We detected for this 2 encoding "WINDOWS-1256" and "WINDOWS-1253", that when we try to execute 'getTextContent()' or 'getHtmlContent()' the CPU is up to 100%. The only solution that we have right now is to apply 'utf-8' charset.

$messageTextPart = $message->getTextPart();
if($messageTextPart != null){
    $charsetText = $messageTextPart->getCharset();
    if (!isset($charsetText) || in_array($charsetText, $this->overrideCharsets)) {
        $messageTextPart->setCharsetOverride('utf-8');
    }
}

$messageHtmlPart = $message->getHtmlPart();
if($messageHtmlPart != null){
    $charsetBody = $messageHtmlPart->getCharset();
    if (!isset($charsetBody) || in_array($charsetBody, $this->overrideCharsets)) {
        $messageHtmlPart->setCharsetOverride('utf-8');
    }
}

Do you have any recommendation? Other encoding may appear. Is recommended to force 'utf-8' charset for all emails?

smariussorin avatar Jan 23 '24 08:01 smariussorin

Hi @smariussorin --

I use either mb_convert_encoding or iconv if it doesn't work out. It's part of a separate project though: zbateson/mb-wrapper. I would be surprised if windows-1256/1253 are the issue. Are you able to try the email on different versions of php/different OSes (or versions of iconv?).

All the best

zbateson avatar Jan 29 '24 17:01 zbateson

It`s very weird, but using the charsetoverride, it solves the issue.

Similar I have issue when I want to read attachment from charset "238". Is the any way to force UT8 for attachments also?

smariussorin avatar Feb 06 '24 12:02 smariussorin

'setCharsetOverride' can be used on any part, it's part of IMessagePart: https://mail-mime-parser.org/api/2.4/classes/ZBateson-MailMimeParser-Message-IMessagePart.html

zbateson avatar Feb 06 '24 18:02 zbateson