pyzmail copied to clipboard
pyzmail crashes when parsing a mail with badly encoded UTF-8 header
Hello aspineux,
I have a crash happenning at pyzmail, at some rare malformed mail file. It seems like a pyzmail mistreating such file.
Details: Python version:pyt 3.4.2 pyzmail - 1.0.3 Linux - Debian 8
$ grep version /usr/lib/python3.4/email/ version = '5.1.0'
Crash reason: If header can not be encoded (UTF-8 is badly encoded), Compat32._sanitize_header() at doesn't return a string, but an instance of class email.header.Header That causes pyzmail to crash when trying to activate Header.startswith()
Reproduce: Take the attached file, and run from python3: import pyzmail pyzmail.message_from_bytes(open('/tmp/mail_utf8_error', 'rb').read())
Traceback (most recent call last):
File "
No problem for me if use:
# pyzmail 1.0.3 - Python 3.4.3 - windows 7
>>> msg = pyzmail.message_from_file(open('mail_utf8_error'))
>>> msg.as_string()
'Return-Path: <[email protected]>\nReceived: from ( [])\n\tby i-sgcore01-poc-server.c.trusty-catbird-121621.internal (Postfix) with ESMTPS i
d 020B9408FA\n\tfor <[email protected]>; Fri, 13 May 2016 17:13:48 +0300 (IDT)\nReceived: by with SMTP id gw7so150349649pac.0\n for <[email protected]>; Fri, 13 May 2016
07:13:47 -0700 (PDT)\nX-Original-Authentication-Results:; spf=pass ( domain of [email protected] designates as permitted sender) smtp.mailfrom=a@s
.com\nX-Received: by with SMTP id e127mr22852924pfa.81.1463148827242;\n Fri, 13 May 2016 07:13:47 -0700 (PDT)\nX-Received: by with SMTP id e127mr22
852810pfa.81.1463148826374;\n Fri, 13 May 2016 07:13:46 -0700 (PDT)\nReceived: from ( [])\n
by with SMTP id i10si24956178paz.90.2016.\n for <[email protected]>;\n Fri, 13 May 2016 07:13:46 -0700 (PDT)\nReceived-SPF: pass (google.c
om: domain of [email protected] designates as permitted sender) client-ip=;\nAuthentication-Results:;\n spf=pass ( domain of [email protected] d
esignates as permitted sender) [email protected]\nReceived: (qmail 2115 invoked by uid 0); 13 May 2016 14:12:47 -0000\nReceived: from unknown (HELO cmgw4) (10.0.90
.85)\n by with SMTP; 13 May 2016 14:12:47 -0000\nReceived: from ([])\n\tby cmgw4 with\n\tid tqCi1s00d4Z6XqA01qCly4; Fri, 13 May 2016 08
:12:47 -0600\nReceived: from [] (port=61165 helo=LocalHost)\n\tby with esmtpsa (TLSv1:AES128-SHA:128)\n\t(Exim 4.86_2)\n\t(envelope-from <[email protected]>
)\n\tid 1b1DpX-0008LG-HL\n\tfor [email protected]; Fri, 13 May 2016 08:12:42 -0600\nMessage-ID: <[email protected]>\nFrom: "Ms.A" <[email protected]>\nReply-To:
<[email protected]>\nTo: "E" <[email protected]>\nSubject: Re:Hi E,Greetings from S A\nDate: Fri, 13 May 2016 22:22:24 +0800\nMIME-Version: 1.0\nX-Priority: 3\nX-Mailer: Joinf MailSystem 8.0\nConten
t-Type: multipart/related;\n\ttype="multipart/alternative";\n\tboundary="Mark=_217952388210897619413514"\nX-Identified-User: {} {sentby:smtp auth
authed with [email protected]}\n\n\n--Mark=_2179523882108976194183049--\n\n--Mark=_217952388210897619413514\nContent-Type: image/jpg;\n\tname="=?utf-8?Q?=E5=95=86=E5=AF=8Clog.jpg?="\nContent
-Transfer-Encoding: base64\nContent-ID: =?utf-8?b?PMOJw4zCuMK7bG9nLmpwZ0A0MjUwMy42NDA2NjA2NDgxLjY1?=\n\n\n--Mark=_217952388210897619413514--\n'
Thanks for checking, srault95. Maybe it's a windows-linux difference? At my Debian it happens with Python 3.4.3, pyzmail 1.0.3. Using pyzmail.message_from_file() on that file also raises an exception.
Thank you amikoren, I can reproduce the problem and I will provide a fix soon.
srault95, you are using the "old python2" interface (aka the text interface)
msg = pyzmail.message_from_file(open('mail_utf8_error')) vs
msg = pyzmail.message_from_bytes(open('/tmp/mail_utf8_error', 'rb').read())
In "open('mail_utf8_error')" the content of the file is decoded using your local encoding, and then pyzmail and the mail library is working on a different set of data.
What amikoren is doing is more like this
msg = pyzmail.message_from_file(open('m:\tmp\mail_utf8_error','rb').read().decode('utf-8'))
And what you are doing is more like this
msg = pyzmail.message_from_file(open('m:\tmp\mail_utf8_error','rb').read().decode('cp1252'))
replace 'cp1252' with you local Windows encoding.