mail-parser
mail-parser copied to clipboard
When parsing eml attachment from Gmail, the attachment is being parsed as email instead as attachment
Describe the bug When parsing a message with an eml attachment (and the attachment contains an image, for example), the eml is parsed as a message as well instead of an attachment, so the image within is also being parsed as attachment of the external message.
To Reproduce Steps to reproduce the behavior:
- Create an email with an image attachment, send it to an available inbox.
- forward the specific email from (1) as an attachment.
- run parse_from_bytes/ parse_from_file/ parse_from_str
- See that the only attachment is the image, not the eml.
Expected behavior I expect to see all attachments, including the eml file itself.
Raw mail
testing eml parsing with attachment copy.eml.zip
Environment:
- OS: Linux
- Docker: [yes or no]
- mail-parser version 3.15.0
Additional context I used this code to parse it:
class mailParser:
def run(self, raw_email):
from mailparser import parse_from_bytes, parse_from_file
return parse_from_bytes(bytes(raw_email))
parser = mailParser()
with open('testing eml parsing with attachment.eml', 'rb') as fhdl:
raw_email = fhdl.read()
res = parser.run(raw_email)
print(res)
The results (one attachment- the inner image)
Possible solution:
I tried changing line 353 in the mailparser.py to if not p.is_multipart() or 'attachment' in p.get('content-disposition', '')
So attachments will be able to be processes as attachments even if they are multipart/message.
and added this to line 279:
is_attachment = True
payload = p.get_payload()
filename = dict(payload[0]._headers).get('Subject')
As result, I got:
(2 attachments, both the image and the eml file)
Hello, I have the same problem... Regards
Thanks for this submission. I will check the issue.