mail-parser icon indicating copy to clipboard operation
mail-parser copied to clipboard

If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found

Open yatakoi opened this issue 4 years ago • 13 comments

Raw mail RAW email https://gist.github.com/yatakoi/77523914f80776a8d3323de73417e767

Environment:

  • OS: CentOS 7
  • Docker: no
  • mail-parser version 3.12.0

Additional context If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found

Traceback (most recent call last): File "main.py", line 139, in last_uid = get_emails(host, login, password, last_uid=last_uid) File "main.py", line 54, in get_emails mail = mailparser.parse_from_bytes(message_data[b"RFC822"]) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 116, in parse_from_bytes return MailParser.from_bytes(bt) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 239, in from_bytes return cls(message) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 136, in init self.parse() File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/mailparser.py", line 374, in parse p.get('content-disposition')) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/utils.py", line 80, in wrapper return normalize('NFC', func(*args, **kwargs)) File "/home/m.kostromin/send_tickets/send_tickets/lib64/python3.6/site-packages/mailparser/utils.py", line 114, in ported_string return six.text_type(raw_data, encoding).strip() TypeError: decoding to str: need a bytes-like object, Header found

Please, help me.

yatakoi avatar Aug 15 '20 18:08 yatakoi

I solved this, or at least, found a workaround.

Surround your call with a try and except like this:

message_data = b'\r\n'.join(lines)

try:

    mail = mailparser.parse_from_bytes(message_data[b"RFC822"])

except Exception as e:

    print('This mail has cirillic characters. Trying to parse from string...')

    try:

        mail = mailparser.parse_from_string(message_data[b"RFC822"].decode('ISO-8859-1'))
    
     except Exception as e:
        
        print('This mail is corrupted and cannot be parsed: %s' % str(e))
        
        pass

This way, if the bytes parser fails it will fall back to the string parser and you can change the encoding.

I've been able to parse every single mail thrown at my server this way.

Godlance avatar Sep 04 '20 13:09 Godlance

Hi. Thank you!

Where should I paste this code?

yatakoi avatar Sep 06 '20 05:09 yatakoi

Maybe this snippet works only for Python 3. Can you do a PR here?

fedelemantuano avatar Sep 06 '20 05:09 fedelemantuano

Sorry, but what is PR?

My script works for Python 3.

yatakoi avatar Sep 06 '20 05:09 yatakoi

It's a Pull Request: https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests

fedelemantuano avatar Sep 06 '20 05:09 fedelemantuano

Sorry, it seems I wasn't receiving notifications for this issue correctly.

@yatakoi

In your main.py, line 54, you have the following code:

mail = mailparser.parse_from_bytes(message_data[b"RFC822"])

Replace that line with the snippet I wrote, omitting the message_data = b'\r\n'.join(lines) line.

@fedelemantuano

I did not modify mail-parser, just coded a workaround that goes in my app code. I've never done a PR before so I'm not sure I can help, but if it may solve this issue for everyone I could try.

Godlance avatar Oct 17 '20 13:10 Godlance

The develop branch doesn't have any issue. I will release the new version soon.

$ python3.9 -m mailparser -f ~/Downloads/mail_raw -sa -ap ~/Downloads/test

image

fedelemantuano avatar Feb 26 '21 17:02 fedelemantuano

This issue still seems to occur with mail-parser==3.15.0 and german umlauts like ä, ü, or ö or wrongly decoded strings like ü.

@fedelemantuano was this issue fixed with version 3.15.0?


How to reproduce

Raw email data:

Subject: foobar
To: foobar@example
From: [email protected]
Content-Type: multipart/mixed; boundary=somecontent

--somecontent
Content-Disposition: attachment; filename="Liste übersprungener 1.txt"
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8; name="Liste übersprungener 1.txt"

c3R1ZmY=
--somecontent--

Ready to use snippet:

import mailparser

_header = b'Subject: foobar\nTo: foobar@example\nFrom: [email protected]\nContent-Type: multipart/mixed; boundary=somecontent'
_body = b'--somecontent\nContent-Disposition: attachment; filename="Liste \xc3\xbcbersprungener 1.txt"\nContent-Transfer-Encoding: base64\nContent-Type: text/plain; charset=utf-8; name="Liste \xc3\xbcbersprungener 1.txt"\n\nc3R1ZmY=\n--somecontent--\n'


mailparser.parse_from_bytes(_header + b'\n\n' + _body)

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../mailparser/mailparser.py", line 118, in parse_from_bytes
    return MailParser.from_bytes(bt)
  File ".../mailparser/mailparser.py", line 241, in from_bytes
    return cls(message)
  File ".../mailparser/mailparser.py", line 138, in __init__
    self.parse()
  File ".../mailparser/mailparser.py", line 357, in parse
    content_disposition = ported_string(
  File ".../mailparser/utils.py", line 80, in wrapper
    return normalize('NFC', func(*args, **kwargs))
  File ".../mailparser/utils.py", line 114, in ported_string
    return six.text_type(raw_data, encoding)
TypeError: decoding to str: need a bytes-like object, Header found

fechnert avatar Jun 23 '21 09:06 fechnert

Please send me the raw mail, I can't test it from your snippet.

fedelemantuano avatar Jun 23 '21 20:06 fedelemantuano

GitHub won't let me upload *.eml files, so i simply renamed it to txt: mail.txt

import mailparser
with open('mail.txt', 'rb') as infile:
    text = infile.read()
mailparser.parse_from_bytes(text)

Returns the same issue as mentioned above.

fechnert avatar Jun 24 '21 06:06 fechnert

Any progress on this?

fechnert avatar Jul 23 '21 07:07 fechnert

I'm working on it. I will answer soon.

fedelemantuano avatar Jul 25 '21 20:07 fedelemantuano