php-imap
php-imap copied to clipboard
Attachments (file)names are not correctly decoded
Describe the bug
In some cases, the attachments (file)names are not correctly decoded and contain invalid characters. This happens for names encoded like this: ISO-8859-1''caf%E9.txt. Note that it's not using encoded-words (btw, I cannot find the name of this encoding, do you know it?). The ISO-8859-1 encoding is simply ignored.
Used config
'options' => [
'decoder' => [
'message' => 'iconv',
'attachment' => 'iconv',
],
],
Code to Reproduce
$clientManager = new \Webklex\PHPIMAP\ClientManager();
$clientManager->setConfig([
'options' => [
'decoder' => [
'message' => 'iconv',
'attachment' => 'iconv',
],
],
]);
$email = file_get_contents(__DIR__ . '/email.txt');
$message = \Webklex\PHPIMAP\Message::fromString($email);
foreach ($message->getAttachments() as $attachment) {
$name = $attachment->getName();
echo "Attachment: {$name}\n";
}
You can find an example of problematic email: email.txt (generated with Gnome Evolution).
Expected behavior
The attachment name should be café.txt, but it is caf�.txt.
Desktop / Server (please complete the following information):
- OS: Docker image
php:8.1-fpm(Debian I guess?) - PHP: 8.1
- Version: 5.5.0
- Provider: Gnome Evolution
Additional context
I was able to spot the issue.
In Attachment::decodeName, you test that $name contains the string '' and get the "real" name from it, but you drop the encoding. In my example, ISO-8859-1''caf%E9.txt becomes caf%E9.txt.
Few lines later, you urldecode() the name. Unfortunately, in my case, %E9 is ISO-8859-1 for the character é, while it would be %C3%A9 in UTF-8. Meaning that we still need to convert the string from ISO-8859-1 to UTF-8 with EncodingAliases::convert($name, $encoding) ($encoding being $parts[0] extracted earlier).
I had the same problem but with another config.
$clientManager->setConfig([
'options' => [
'decoder' => [
'message' => 'utf-8',
'attachment' => 'utf-8',
],
],
]);
My solution is to convert the name of the attachment lik this:
echo mb_convert_encoding($attachment->getName(), 'UTF-8', 'ISO-8859-1');
I did something similar too. The problem with this solution is that we don't know the encoding of the initial string. Meaning that if it's not ISO-8859-1, we end with the same issue (the unsupported characters � being replaced by question marks, which may look nicer). This has to be done at the PHP-IMAP level to work properly. Or can we access the raw name (e.g. ISO-8859-1''caf%E9.txt) to extract the encoding ourselves?
Side note: the issue happens also with the UTF-8 decoder indeed. I've been back to this decoder: the issues that I had with it have been fixed after installing the PHP ldap extension. It would be worth a separated issue in GitHub but I don't have much time these days. Don't hesitate to get back to me on this subject after the holidays :)