chatgpt-retrieval icon indicating copy to clipboard operation
chatgpt-retrieval copied to clipboard

Problem with partition_pdf module

Open decsousa opened this issue 1 year ago • 8 comments

Hello, when I try to run the code the following error is displayed:

Traceback (most recent call last): File "C:\Users\Diego Sousa\Desktop\botchatgpt\botchatgpt\chat02.py", line 35, in index = VectorstoreIndexCreator().from_loaders([loader]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\indexes\vectorstore.py", line 72, in from_loaders docs.extend(loader.load()) ^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 137, in load self.load_file(i, p, docs, pbar) File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 94, in load_file raise e File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 88, in load_file sub_docs = self.loader_cls(str(item), **self.loader_kwargs).load() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 86, in load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 171, in _get_elements
return partition(filename=self.file_path, **self.unstructured_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\auto.py", line 221, in partition elements = partition_pdf( ^^^^^^^^^^^^^ NameError: name 'partition_pdf' is not defined. Did you mean: 'partition_xml'?

has anyone had this same problem?

decsousa avatar Aug 03 '23 10:08 decsousa

+1

psujit775 avatar Aug 03 '23 15:08 psujit775

+1

GavinXZhang avatar Aug 03 '23 15:08 GavinXZhang

Following

JayKayNJIT avatar Aug 04 '23 21:08 JayKayNJIT

+1

fengmzhu avatar Aug 05 '23 02:08 fengmzhu

To make it work I had to:

at the file .../site-packages/unstructured/partition/auto.py

add the line: from unstructured.partition.pdf import partition_pdf

then pip3 install pdf2image pdfminer.six

last if you have macOS, search 'Install Certificates.command' in the finder and open it.

Then do the following steps in the terminal:

python3
import nltk
nltk.download()

3dylson avatar Aug 05 '23 18:08 3dylson

Downgrading to version 0.7.12 resolved the problem for me. You can do this by running the following command in your virtual environment:

pip install unstructured==0.7.12

bobbyfongprivate avatar Aug 12 '23 01:08 bobbyfongprivate

pip install unstructured==0.7.12 works

fire115 avatar Aug 15 '23 19:08 fire115

To make it work I had to:

at the file .../site-packages/unstructured/partition/auto.py

add the line: from unstructured.partition.pdf import partition_pdf

then pip3 install pdf2image pdfminer.six

last if you have macOS, search 'Install Certificates.command' in the finder and open it.

Then do the following steps in the terminal:

python3
import nltk
nltk.download()

I tried this but then I got this error: File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 168, in _get_elements from unstructured.partition.auto import partition File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/auto.py", line 28, in from unstructured.partition.pdf import partition_pdf File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/pdf.py", line 19, in from pillow_heif import register_heif_opener ModuleNotFoundError: No module named 'pillow_heif'

any ideas please? @3dylson

Zhi0467 avatar Jun 28 '24 11:06 Zhi0467