msg_obj.save_email_file() is saving eml with empty attachments
Hello,
I have been experimenting with the library to load .msg files in that format and convert them to .eml using the msg_obj.save_email_file() method. The msg file object is loaded normally and everything is successful.
The method correctly saves the bodies and metadata of the emails in the .eml file, but all the attachments are saved in the saved file empty. They contain nothing at all. Is this an issue?
Thanks.
Hello,
I have found the issue: the attachments that are read as binary files, where being corrupted when passed to the olefile library in the msg_parser.py file, in the Message._get_propery_data(). I have corrected the method:
def _get_property_data(self, directory_name, directory_entry, is_list=False):
directory_entry_name = directory_entry.name
if is_list:
stream_name = [directory_name, directory_entry_name]
else:
stream_name = [directory_entry_name]
ole_file = directory_entry.olefile
property_details = self._get_canonical_property_name(directory_entry_name)
if not property_details:
return None
property_name = property_details.get("name")
property_type = property_details.get("data_type")
if not property_type:
return None
try:
raw_content = ole_file.openstream(stream_name).read()
except IOError:
raw_content = None
property_value = self._data_model.get_value(
raw_content, data_type=property_type
)
if property_value:
# If the propery is the data of the attachment it has to be provided raw to preven corruption
if property_name == 'AttachDataObject':
property_detail = {property_name: raw_content}
# Otherwhisle use the olefile lib to get the value
else:
property_detail = {property_name: property_value}
else:
property_detail = None
return property_detail
Then, the EmailFormatter._proces_attachments() method, in the email_builder module, method should not decode the bytes stream:
def _process_attachments(self, attachments):
for attachment in attachments:
ctype = attachment.AttachMimeTag
data = attachment.data
filename = attachment.Filename
maintype, subtype = ctype.split("/", 1)
if data is None:
continue
# Next lines corrupt bynary files and make them unreadable
# if isinstance(data, bytes):
# data = data.decode("utf-8", "ignore")
if maintype == "text" or "message" in maintype:
attach = MIMEText(data, _subtype=subtype)
elif maintype == "image":
attach = MIMEImage(data, _subtype=subtype)
elif maintype == "audio":
attach = MIMEAudio(data, _subtype=subtype)
else:
attach = MIMEBase(maintype, subtype)
attach.set_payload(data)
# Encode the payload using Base64
encoders.encode_base64(attach)
# Set the filename parameter
base_filename = os.path.basename(filename)
attach.add_header("Content-ID", "<{}>".format(base_filename))
attach.add_header(
"Content-Disposition", "attachment", filename=base_filename
)
self.message.attach(attach)
Thanks.
@danieldiezmallo Thank you for finding the bug. Can you open a PR for the above change ?
I just opened a PR with similar bug fix but kept bytes decoding in case MimeType is text. Tests ran fine with Python3.10/Windows 10. I'd be very grateful if you could merge the PR, bump version to 1.2.1 and publish it to pypi.
Many thanks for this project !
has this been released? :)