msg_parser icon indicating copy to clipboard operation
msg_parser copied to clipboard

msg_obj.save_email_file() is saving eml with empty attachments

Open danieldiezmallo opened this issue 4 years ago • 5 comments

Hello,

I have been experimenting with the library to load .msg files in that format and convert them to .eml using the msg_obj.save_email_file() method. The msg file object is loaded normally and everything is successful.

The method correctly saves the bodies and metadata of the emails in the .eml file, but all the attachments are saved in the saved file empty. They contain nothing at all. Is this an issue?

Thanks.

danieldiezmallo avatar Jul 14 '21 09:07 danieldiezmallo

Hello,

I have found the issue: the attachments that are read as binary files, where being corrupted when passed to the olefile library in the msg_parser.py file, in the Message._get_propery_data(). I have corrected the method:

def _get_property_data(self, directory_name, directory_entry, is_list=False):
        directory_entry_name = directory_entry.name
        if is_list:
            stream_name = [directory_name, directory_entry_name]
        else:
            stream_name = [directory_entry_name]

        ole_file = directory_entry.olefile
        property_details = self._get_canonical_property_name(directory_entry_name)
        if not property_details:
            return None

        property_name = property_details.get("name")
        property_type = property_details.get("data_type")
        if not property_type:
            return None

        try:
            raw_content = ole_file.openstream(stream_name).read()
        except IOError:
            raw_content = None
        property_value = self._data_model.get_value(
            raw_content, data_type=property_type
        )
        if property_value:
            
            # If the propery is the data of the attachment it has to be provided raw to preven corruption
            if property_name == 'AttachDataObject':
                property_detail = {property_name: raw_content}
            # Otherwhisle use the olefile lib to get the value
            else:
                property_detail = {property_name: property_value}
        else:
            property_detail = None
        return property_detail

Then, the EmailFormatter._proces_attachments() method, in the email_builder module, method should not decode the bytes stream:

def _process_attachments(self, attachments):
        for attachment in attachments:
            ctype = attachment.AttachMimeTag
            data = attachment.data
            filename = attachment.Filename
            maintype, subtype = ctype.split("/", 1)
                        
            if data is None:
                continue

# Next lines corrupt bynary files and make them unreadable
#             if isinstance(data, bytes):
#                 data = data.decode("utf-8", "ignore")
    
            if maintype == "text" or "message" in maintype:
                attach = MIMEText(data, _subtype=subtype)
            elif maintype == "image":
                attach = MIMEImage(data, _subtype=subtype)
            elif maintype == "audio":
                attach = MIMEAudio(data, _subtype=subtype)
            else:
                attach = MIMEBase(maintype, subtype)
                attach.set_payload(data)

                # Encode the payload using Base64
                encoders.encode_base64(attach)
            # Set the filename parameter
            base_filename = os.path.basename(filename)
            attach.add_header("Content-ID", "<{}>".format(base_filename))
            attach.add_header(
                "Content-Disposition", "attachment", filename=base_filename
            )
            self.message.attach(attach)

Thanks.

danieldiezmallo avatar Jul 14 '21 14:07 danieldiezmallo

@danieldiezmallo Thank you for finding the bug. Can you open a PR for the above change ?

vikramarsid avatar Jul 27 '21 22:07 vikramarsid

I just opened a PR with similar bug fix but kept bytes decoding in case MimeType is text. Tests ran fine with Python3.10/Windows 10. I'd be very grateful if you could merge the PR, bump version to 1.2.1 and publish it to pypi.

Many thanks for this project !

DayDotMe avatar Jan 28 '22 10:01 DayDotMe

has this been released? :)

BenjaminHoegh avatar Apr 09 '22 07:04 BenjaminHoegh