unstructured
unstructured copied to clipboard
UnidentifiedImageError: cannot identify image file '/tmp/tmpa3o9dj66/b5d7995b-82db-4257-bdcb-20795a00c72b-01.ppm'
I Have the Clear pdf with proper images but this give
from unstructured.partition.pdf import partition_pdf from PIL import UnidentifiedImageError
# Extract images, tables, and chunk text
raw_pdf_elements = partition_pdf( filename='/content/2023-conocophillips-aim-presentation.pdf', extract_images_in_pdf=True, infer_table_structure=True, chunking_strategy="by_title", max_characters=4000, new_after_n_chars=3800, combine_text_under_n_chars=2000, image_output_dir_path='/content/', )
I lot of RND but not find any solution unstructured is not a good for pdf parser
If you are running on colab or Jupyter restart the session and then try again.
Duplicate of #3102