private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

python ingest.py throws error "file is not a zip file"

Open oneindelijk opened this issue 2 years ago • 3 comments
trafficstars

Describe the bug and how to reproduce it

  1. create .env file from template
  2. create db and models folders
  3. download default model
  4. add some txt files in source_documents
  5. run "python ingest.py"

Expected behavior Script completes with no errors

Environment (please complete the following information):

  • OS / hardware: Linux (archlinux)/NVidia RTX2080 Ti Super/Intel i9
  • Python version 3.11.3
  • Other relevant information

Additional context Traceback (most recent call last): File "/usr/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/home/sam/steam/git3rd/privateGPT/ingest.py", line 89, in load_single_document return loader.load() ^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 71, in load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/langchain/document_loaders/html.py", line 11, in _get_elements from unstructured.partition.html import partition_html File "/home/sam/.local/lib/python3.11/site-packages/unstructured/partition/html.py", line 6, in <module> from unstructured.documents.html import HTMLDocument File "/home/sam/.local/lib/python3.11/site-packages/unstructured/documents/html.py", line 25, in <module> from unstructured.partition.text_type import ( File "/home/sam/.local/lib/python3.11/site-packages/unstructured/partition/text_type.py", line 21, in <module> from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize File "/home/sam/.local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 32, in <module> _download_nltk_package_if_not_present(package_name, package_category) File "/home/sam/.local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 21, in _download_nltk_package_if_not_present nltk.find(f"{package_category}/{package_name}") File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 555, in find return find(modified_name, paths) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 542, in find return ZipFilePathPointer(p, zipentry) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/compat.py", line 41, in _decorator return init_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 394, in __init__ zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/compat.py", line 41, in _decorator return init_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sam/.local/lib/python3.11/site-packages/nltk/data.py", line 935, in __init__ zipfile.ZipFile.__init__(self, filename) File "/usr/lib/python3.11/zipfile.py", line 1301, in __init__ self._RealGetContents() File "/usr/lib/python3.11/zipfile.py", line 1368, in _RealGetContents raise BadZipFile("

oneindelijk avatar Jun 02 '23 17:06 oneindelijk

I'm also getting this

drudge avatar Jun 03 '23 00:06 drudge

This worked for me. (you only need to run it once)

import nltk
nltk.download("averaged_perceptron_tagger")

ibushong avatar Jun 03 '23 07:06 ibushong

nltk.download("averaged_perceptron_tagger")

Yes, indeed, that does the job! Thanks

oneindelijk avatar Jun 03 '23 10:06 oneindelijk