mail-parser If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found

Raw mail RAW email https://gist.github.com/yatakoi/77523914f80776a8d3323de73417e767

Environment:

OS: CentOS 7
Docker: no
mail-parser version 3.12.0

Additional context If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found

Traceback (most recent call last): File "main.py", line 139, in last_uid = get_emails(host, login, password, last_uid=last_uid) File "main.py", line 54, in get_emails mail = mailparser.parse_from_bytes(message_data[b"RFC822"]) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 116, in parse_from_bytes return MailParser.from_bytes(bt) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 239, in from_bytes return cls(message) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 136, in init self.parse() File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 374, in parse p.get('content-disposition')) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/utils.py", line 80, in wrapper return normalize('NFC', func(*args, **kwargs)) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/utils.py", line 114, in ported_string return six.text_type(raw_data, encoding).strip() TypeError: decoding to str: need a bytes-like object, Header found

Please, help me.

Aug 15 '20 18:08 yatakoi

I solved this, or at least, found a workaround.

Surround your call with a try and except like this:

message_data = b'\r\n'.join(lines)

try:

    mail = mailparser.parse_from_bytes(message_data[b"RFC822"])

except Exception as e:

    print('This mail has cirillic characters. Trying to parse from string...')

    try:

        mail = mailparser.parse_from_string(message_data[b"RFC822"].decode('ISO-8859-1'))
    
     except Exception as e:
        
        print('This mail is corrupted and cannot be parsed: %s' % str(e))
        
        pass

This way, if the bytes parser fails it will fall back to the string parser and you can change the encoding.

I've been able to parse every single mail thrown at my server this way.

Sep 04 '20 13:09 Godlance

Hi. Thank you!

Where should I paste this code?

Sep 06 '20 05:09 yatakoi

Maybe this snippet works only for Python 3. Can you do a PR here?

Sep 06 '20 05:09 fedelemantuano

Sorry, but what is PR?

My script works for Python 3.

Sep 06 '20 05:09 yatakoi

It's a Pull Request: https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests

Sep 06 '20 05:09 fedelemantuano

Sorry, it seems I wasn't receiving notifications for this issue correctly.

@yatakoi

In your main.py, line 54, you have the following code:

mail = mailparser.parse_from_bytes(message_data[b"RFC822"])

Replace that line with the snippet I wrote, omitting the message_data = b'\r\n'.join(lines) line.

@fedelemantuano

I did not modify mail-parser, just coded a workaround that goes in my app code. I've never done a PR before so I'm not sure I can help, but if it may solve this issue for everyone I could try.

Oct 17 '20 13:10 Godlance

The develop branch doesn't have any issue. I will release the new version soon.

$ python3.9 -m mailparser -f ~/Downloads/mail_raw -sa -ap ~/Downloads/test

Feb 26 '21 17:02 fedelemantuano

This issue still seems to occur with mail-parser==3.15.0 and german umlauts like ä, ü, or ö or wrongly decoded strings like Ã¼.

@fedelemantuano was this issue fixed with version 3.15.0?

How to reproduce

Raw email data:

Subject: foobar
To: foobar@example
From: [email protected]
Content-Type: multipart/mixed; boundary=somecontent

--somecontent
Content-Disposition: attachment; filename="Liste übersprungener 1.txt"
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8; name="Liste übersprungener 1.txt"

c3R1ZmY=
--somecontent--

Ready to use snippet:

import mailparser

_header = b'Subject: foobar\nTo: foobar@example\nFrom: [email protected]\nContent-Type: multipart/mixed; boundary=somecontent'
_body = b'--somecontent\nContent-Disposition: attachment; filename="Liste \xc3\xbcbersprungener 1.txt"\nContent-Transfer-Encoding: base64\nContent-Type: text/plain; charset=utf-8; name="Liste \xc3\xbcbersprungener 1.txt"\n\nc3R1ZmY=\n--somecontent--\n'


mailparser.parse_from_bytes(_header + b'\n\n' + _body)

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../mailparser/mailparser.py", line 118, in parse_from_bytes
    return MailParser.from_bytes(bt)
  File ".../mailparser/mailparser.py", line 241, in from_bytes
    return cls(message)
  File ".../mailparser/mailparser.py", line 138, in __init__
    self.parse()
  File ".../mailparser/mailparser.py", line 357, in parse
    content_disposition = ported_string(
  File ".../mailparser/utils.py", line 80, in wrapper
    return normalize('NFC', func(*args, **kwargs))
  File ".../mailparser/utils.py", line 114, in ported_string
    return six.text_type(raw_data, encoding)
TypeError: decoding to str: need a bytes-like object, Header found

Jun 23 '21 09:06 fechnert

Please send me the raw mail, I can't test it from your snippet.

Jun 23 '21 20:06 fedelemantuano

GitHub won't let me upload *.eml files, so i simply renamed it to txt: mail.txt

import mailparser
with open('mail.txt', 'rb') as infile:
    text = infile.read()
mailparser.parse_from_bytes(text)

Returns the same issue as mentioned above.

Jun 24 '21 06:06 fechnert

Any progress on this?

Jul 23 '21 07:07 fechnert

I'm working on it. I will answer soon.

Jul 25 '21 20:07 fedelemantuano

mail-parser mail-parser copied to clipboard

If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found

How to reproduce

mail-parser
mail-parser copied to clipboard