mail-parser
mail-parser copied to clipboard
If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found
Raw mail RAW email https://gist.github.com/yatakoi/77523914f80776a8d3323de73417e767
Environment:
- OS: CentOS 7
- Docker: no
- mail-parser version 3.12.0
Additional context If the attachment name is in Cyrillic then TypeError: decoding to str: need a bytes-like object, Header found
Traceback (most recent call last):
File "main.py", line 139, in
Please, help me.
I solved this, or at least, found a workaround.
Surround your call with a try and except like this:
message_data = b'\r\n'.join(lines)
try:
mail = mailparser.parse_from_bytes(message_data[b"RFC822"])
except Exception as e:
print('This mail has cirillic characters. Trying to parse from string...')
try:
mail = mailparser.parse_from_string(message_data[b"RFC822"].decode('ISO-8859-1'))
except Exception as e:
print('This mail is corrupted and cannot be parsed: %s' % str(e))
pass
This way, if the bytes parser fails it will fall back to the string parser and you can change the encoding.
I've been able to parse every single mail thrown at my server this way.
Hi. Thank you!
Where should I paste this code?
Maybe this snippet works only for Python 3. Can you do a PR here?
Sorry, but what is PR?
My script works for Python 3.
It's a Pull Request: https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests
Sorry, it seems I wasn't receiving notifications for this issue correctly.
@yatakoi
In your main.py, line 54, you have the following code:
mail = mailparser.parse_from_bytes(message_data[b"RFC822"])
Replace that line with the snippet I wrote, omitting the message_data = b'\r\n'.join(lines)
line.
@fedelemantuano
I did not modify mail-parser, just coded a workaround that goes in my app code. I've never done a PR before so I'm not sure I can help, but if it may solve this issue for everyone I could try.
The develop branch doesn't have any issue. I will release the new version soon.
$ python3.9 -m mailparser -f ~/Downloads/mail_raw -sa -ap ~/Downloads/test
This issue still seems to occur with mail-parser==3.15.0
and german umlauts like ä
, ü
, or ö
or wrongly decoded strings like ü
.
@fedelemantuano was this issue fixed with version 3.15.0
?
How to reproduce
Raw email data:
Subject: foobar
To: foobar@example
From: [email protected]
Content-Type: multipart/mixed; boundary=somecontent
--somecontent
Content-Disposition: attachment; filename="Liste übersprungener 1.txt"
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8; name="Liste übersprungener 1.txt"
c3R1ZmY=
--somecontent--
Ready to use snippet:
import mailparser
_header = b'Subject: foobar\nTo: foobar@example\nFrom: [email protected]\nContent-Type: multipart/mixed; boundary=somecontent'
_body = b'--somecontent\nContent-Disposition: attachment; filename="Liste \xc3\xbcbersprungener 1.txt"\nContent-Transfer-Encoding: base64\nContent-Type: text/plain; charset=utf-8; name="Liste \xc3\xbcbersprungener 1.txt"\n\nc3R1ZmY=\n--somecontent--\n'
mailparser.parse_from_bytes(_header + b'\n\n' + _body)
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../mailparser/mailparser.py", line 118, in parse_from_bytes
return MailParser.from_bytes(bt)
File ".../mailparser/mailparser.py", line 241, in from_bytes
return cls(message)
File ".../mailparser/mailparser.py", line 138, in __init__
self.parse()
File ".../mailparser/mailparser.py", line 357, in parse
content_disposition = ported_string(
File ".../mailparser/utils.py", line 80, in wrapper
return normalize('NFC', func(*args, **kwargs))
File ".../mailparser/utils.py", line 114, in ported_string
return six.text_type(raw_data, encoding)
TypeError: decoding to str: need a bytes-like object, Header found
Please send me the raw mail, I can't test it from your snippet.
GitHub won't let me upload *.eml files, so i simply renamed it to txt: mail.txt
import mailparser
with open('mail.txt', 'rb') as infile:
text = infile.read()
mailparser.parse_from_bytes(text)
Returns the same issue as mentioned above.
Any progress on this?
I'm working on it. I will answer soon.