unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

UnidentifiedImageError: cannot identify image file '/tmp/tmpa3o9dj66/b5d7995b-82db-4257-bdcb-20795a00c72b-01.ppm'

Open SaleemMalikAI opened this issue 1 year ago • 1 comments

I Have the Clear pdf with proper images but this give

from unstructured.partition.pdf import partition_pdf from PIL import UnidentifiedImageError

# Extract images, tables, and chunk text

raw_pdf_elements = partition_pdf( filename='/content/2023-conocophillips-aim-presentation.pdf', extract_images_in_pdf=True, infer_table_structure=True, chunking_strategy="by_title", max_characters=4000, new_after_n_chars=3800, combine_text_under_n_chars=2000, image_output_dir_path='/content/', )

I lot of RND but not find any solution unstructured is not a good for pdf parser

SaleemMalikAI avatar Aug 03 '24 19:08 SaleemMalikAI

If you are running on colab or Jupyter restart the session and then try again.

ShkAmmarHussain avatar Aug 19 '24 09:08 ShkAmmarHussain

Duplicate of #3102

scanny avatar Dec 17 '24 19:12 scanny