minecart icon indicating copy to clipboard operation
minecart copied to clipboard

PIL.UnidentifiedImageError: cannot identify image file

Open aravindajju opened this issue 3 years ago • 0 comments

I am trying to read an image in PNG format from PDF file. I get the following error.

Traceback (most recent call last):
  File "D:\workspace\pdfextraction\pdfextract.py", line 10, in <module>
    im = page.images[1].as_pil()  # requires pillow
  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\site-packages\minecart\content.py", line 368, in as_pil
    image = PIL.Image.open(io.BytesIO(image_data))
  File "C:\Users\Lenovo\AppData\Roaming\Python\Python39\site-packages\PIL\Image.py", lin

Here is my code

import minecart

pdffile = open('RunScribe-330601.pdf', 'rb')
doc = minecart.Document(pdffile)
page = doc.get_page(1)

#for shape in page.shapes.iter_in_bbox((0, 0, 100, 200)):
#    print (shape.path, shape.fill.color.as_rgb())

im = page.images[1].as_pil()  # requires pillow
#im.show()

for image in page.images:
    print (image.as_pil())

I have tried multiple PDF files with PNG and JPEG images. Pages with JPEG images work fine. Here is the PDF file that I tried.

https://drive.google.com/file/d/1i_ZY5JPYEfs_v43DFuHUEL0eUaeSIv0R/view?usp=sharing

Any pointers on what could be the reason?

Regards, Aravind.

aravindajju avatar Apr 26 '21 04:04 aravindajju