chatgpt-retrieval
chatgpt-retrieval copied to clipboard
Problem with partition_pdf module
Hello, when I try to run the code the following error is displayed:
Traceback (most recent call last):
File "C:\Users\Diego Sousa\Desktop\botchatgpt\botchatgpt\chat02.py", line 35, in
return partition(filename=self.file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\auto.py", line 221, in partition
elements = partition_pdf(
^^^^^^^^^^^^^
NameError: name 'partition_pdf' is not defined. Did you mean: 'partition_xml'?
has anyone had this same problem?
+1
+1
Following
+1
To make it work I had to:
at the file .../site-packages/unstructured/partition/auto.py
add the line: from unstructured.partition.pdf import partition_pdf
then pip3 install pdf2image pdfminer.six
last if you have macOS, search 'Install Certificates.command' in the finder and open it.
Then do the following steps in the terminal:
python3
import nltk
nltk.download()
Downgrading to version 0.7.12 resolved the problem for me. You can do this by running the following command in your virtual environment:
pip install unstructured==0.7.12
pip install unstructured==0.7.12 works
To make it work I had to:
at the file
.../site-packages/unstructured/partition/auto.py
add the line:
from unstructured.partition.pdf import partition_pdf
then
pip3 install pdf2image pdfminer.six
last if you have macOS, search 'Install Certificates.command' in the finder and open it.
Then do the following steps in the terminal:
python3 import nltk nltk.download()
I tried this but then I got this error:
File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 168, in _get_elements
from unstructured.partition.auto import partition
File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/auto.py", line 28, in
any ideas please? @3dylson