MessageAttachment not created for message/rfc822 part despite of Content-Disposition: attachment
Incoming multipart email containing text/html, text/plain and message/rfc822 (with Content-Disposition set to attachment). I would expect to get the message/rfc822 part saved as a MessageAttachment and get an .eml file for it. What happens is instead that the message/rfc822 part gets parsed recursively and it's "merged" into the containing message.
I'm not sure this is the intended behavior or just a corner case. Would it be possible to have this behavior adjusted or configurable?
Below an attempt i made to fix it while trying to understand where the problem was. I'm not very comfortable with email standards so I would really be grateful if someone with more experience could review it and possibly include a fix in one of the next releases.
I pointed out with "EDIT" the changes I made (the first is just one line the second is an elif branch I added).
def _get_dehydrated_message(self, msg, record):
settings = utils.get_settings()
new = EmailMessage()
# EDIT: I added the Content-Disposition check to avoid the recursive parse of attached message/rfc822
if msg.is_multipart() and not 'attachment' in msg.get('Content-Disposition', ''):
for header, value in msg.items():
new[header] = value
for part in msg.get_payload():
new.attach(
self._get_dehydrated_message(part, record)
)
elif (
settings['strip_unallowed_mimetypes']
and not msg.get_content_type() in settings['allowed_mimetypes']
):
for header, value in msg.items():
new[header] = value
# Delete header, otherwise when attempting to deserialize the
# payload, it will be expecting a body for this.
del new['Content-Transfer-Encoding']
new[settings['altered_message_header']] = (
'Stripped; Content type %s not allowed' % (
msg.get_content_type()
)
)
new.set_payload('')
# EDIT: added this elif branch (which is very similar to the attachment and could be refactored)
elif msg.get_content_type() == 'message/rfc822' and 'attachment' in msg.get('Content-Disposition', ''):
extension = ".eml"
attachment = MessageAttachment()
# can 'message/rfc822' have anything else than one single part? if so this is not safe and needs to be handled properly
msg_part = msg.get_payload(0)
attachment.document.save(
uuid.uuid4().hex + extension,
ContentFile(
six.BytesIO(
msg_part.as_string(unixfrom=False) # see django-mailbox issue #100
).getvalue()
)
)
attachment.message = record
for key, value in msg.items():
attachment[key] = value
attachment.save()
placeholder = EmailMessage()
placeholder[
settings['attachment_interpolation_header']
] = str(attachment.pk)
new = placeholder
elif (
(
msg.get_content_type() not in settings['text_stored_mimetypes']
) or
('attachment' in msg.get('Content-Disposition', ''))
):
filename = None
# ... original code ...
Ooo, this is definitely a bug; could you post this as a pull request so I can more-easily review it? If you're not super familiar with how that works, I can help walk you through the process if you hop into the gitter.im channel.
Cheers!
Thanks for the quick reply! Pull request submitted.