private-gpt
private-gpt copied to clipboard
python ingest.py throws error "file is not a zip file"
Describe the bug and how to reproduce it
- create .env file from template
- create db and models folders
- download default model
- add some txt files in source_documents
- run "python ingest.py"
Expected behavior Script completes with no errors
Environment (please complete the following information):
- OS / hardware: Linux (archlinux)/NVidia RTX2080 Ti Super/Intel i9
- Python version 3.11.3
- Other relevant information
Additional context
Traceback (most recent call last): File "/usr/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/home/sam/steam/git3rd/privateGPT/ingest.py", line 89, in load_single_document return loader.load() ^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 71, in load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/langchain/document_loaders/html.py", line 11, in _get_elements from unstructured.partition.html import partition_html File "/home/sam/.local/lib/python3.11/site-packages/unstructured/partition/html.py", line 6, in <module> from unstructured.documents.html import HTMLDocument File "/home/sam/.local/lib/python3.11/site-packages/unstructured/documents/html.py", line 25, in <module> from unstructured.partition.text_type import ( File "/home/sam/.local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 21, in <module> from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize File "/home/sam/.local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 32, in <module> _download_nltk_package_if_not_present(package_name, package_category) File "/home/sam/.local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 21, in _download_nltk_package_if_not_present nltk.find(f"{package_category}/{package_name}") File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 555, in find return find(modified_name, paths) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 542, in find return ZipFilePathPointer(p, zipentry) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/compat.py", line 41, in _decorator return init_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 394, in __init__ zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/compat.py", line 41, in _decorator return init_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 935, in __init__ zipfile.ZipFile.__init__(self, filename) File "/usr/lib/python3.11/zipfile.py", line 1301, in __init__ self._RealGetContents() File "/usr/lib/python3.11/zipfile.py", line 1368, in _RealGetContents raise BadZipFile("
I'm also getting this
This worked for me. (you only need to run it once)
import nltk
nltk.download("averaged_perceptron_tagger")
nltk.download("averaged_perceptron_tagger")
Yes, indeed, that does the job! Thanks