robotframework-imaplibrary
robotframework-imaplibrary copied to clipboard
How to extract PDF from a multipart email
Hello ! Is it possible to read and write to a file a part of a multipart email which is of type application/pdf ???
I have already tried with
\ ${content-type}= Get Multipart Content Type
\ ${payload}= Get Multipart Payload
\ Run Keyword If '${content-type}' == 'application/pdf' Create File blablabla_NotDecoded.pdf ${payload}
but the generated file could not be read by Adobe Reader.
And also
\ ${content-type}= Get Multipart Content Type
\ ${payload}= Get Multipart Payload decode=True
\ Run Keyword If '${content-type}' == 'application/pdf' Create File blablabla_Decoded.pdf ${payload}
didn't work since RF said: "UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 11: ordinal not in range(128)"
Regards
@winko, I added the multipart code for HTML - have never tested with other content types.
Just curious, what is the charset of the mime part containing the pdf? In my testing, emails have charset "UTF-8", so it's surprising that RF used 'ascii'.
If you know some python, the quickest way to debug this is to write a short script using imaplib and email modules to retrieve and decode the email. Here's a snippet to get your started, but you'll have to hack it until it works..
import imaplib
import email
server='imap.gmail.com'
user=''
pw=''
crit=['FROM', '', 'TO', '', 'UNSEEN']
pdf_index = 1
imap=imaplib.IMAP4_SSL(server,993)
imap.login(user,pw)
imap.select(readonly=True)
typ, msgnums = imap.search(None,*crit)
data = imap.fetch(msgnums[0].split()[-1], '(RFC822)')[1][0][1]
imap.close()
msg = email.message_from_string(data.decode())
pdf_part = msg.get_payload()[pdf_index]
pdf = pdf_part.get_payload(decode=True)
@martinhill could you implement a keyword for extracting the pdf data and send me a pull request please?
@bogensberger I tried extracting pdf data in Python and I think the code should work fine as is, using decode=True. I suspect the problem is to do with Create File.
I will send you a pull request for a different issue, though. I found a problem with gmail, when the email arrives after the Open Mailbox keyword was executed.
@winko can you run your second case with debugging on and attach the log?
Thanks for your investigations!
Here comes my debug log:
20140228 17:05:02.201 - INFO - +----- START KW: ${content-type} = ImapLibrary.Get Multipart Content Type [ ]
20140228 17:05:02.201 - INFO - ${content-type} = application/pdf
20140228 17:05:02.201 - INFO - +----- END KW: ${content-type} = ImapLibrary.Get Multipart Content Type (0)
20140228 17:05:02.202 - INFO - +----- START KW: ${payload} = ImapLibrary.Get Multipart Payload [ decode=True ]
20140228 17:05:02.297 - INFO - ${payload} = %PDF-1.4
%\x83\x92\xfa\xfe
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
3 0 obj
<<
/CreationDate (D:20030618124400)
/Author (\xfe\xff K a t j a M \xf6 l l e r)
/Keywords ()
/Subject...
20140228 17:05:02.298 - INFO - +----- END KW: ${payload} = ImapLibrary.Get Multipart Payload (1)
20140228 17:05:02.298 - INFO - +----- START KW: OperatingSystem.Create File [ blablabla.pdf | ${payload} ]
20140228 17:05:02.317 - FAIL - UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 11: ordinal not in range(128)
20140228 17:05:02.318 - DEBUG - Traceback (most recent call last):
File "C:\Python27\lib\site-packages\robot\libraries\OperatingSystem.py", line 611, in create_file
path = self._write_to_file(path, content, encoding, 'w')
File "C:\Python27\lib\site-packages\robot\libraries\OperatingSystem.py", line 630, in _write_to_file
f.write(content.encode(encoding))
20140228 17:05:02.318 - INFO - +----- END KW: OperatingSystem.Create File (20)
@winko It seems robot framework can't always handle binary data such as PDF. Looking at the robot package libraries/OperatingSystem.py I found:
def _write_to_file(self, path, content, encoding, mode):
path = self._absnorm(path)
parent = os.path.dirname(path)
if not os.path.exists(parent):
os.makedirs(parent)
f = open(path, mode+'b')
try:
f.write(content.encode(encoding))
finally:
f.close()
return path
I tested out writing pdf to file like this:
f.write(pdf)
..and it worked. It is binary data which should not be encoded as Create File keyword does. I can only suggest writing your own keyword to save your pdf data. All you need is to clone the code from OperatingSystem.py and remove the encoding step. Alternatively, find a way to write your test that doesn't save the pdf to a file.
Martin