localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'

Open paxvinci opened this issue 1 year ago • 2 comments

Hi all, I'm trying to ingest my documents but after the loading of all the documents (17689 files) I receive the error in the title. This is the tail of the log:

2023-11-15 16:23:59,796 - INFO - ingest.py:153 - Loaded 17689 documents from /home/<my home>/dev/localGPT/SOURCE_DOCUMENTS
2023-11-15 16:23:59,797 - INFO - ingest.py:154 - Split into 137095 chunks of text
2023-11-15 16:24:01,045 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-xl
load INSTRUCTOR_Transformer
2023-11-15 16:24:01,351 - INFO - instantiator.py:21 - Created a temporary directory at /tmp/tmpmyq875y7
2023-11-15 16:24:01,352 - INFO - instantiator.py:76 - Writing /tmp/tmpmyq875y7/_remote_module_non_scriptable.py
max_seq_length  512
Traceback (most recent call last):
  File "/home/<my home>/dev/localGPT/ingest.py", line 181, in <module>
    main()
  File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/<my home>/dev/localGPT/ingest.py", line 168, in main
    db = Chroma.from_documents(
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 613, in from_documents
    return cls.from_texts(
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 568, in from_texts
    chroma_collection = cls(
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 120, in __init__
    self._client = chromadb.Client(_client_settings)
  File "/usr/local/lib/python3.10/dist-packages/chromadb/__init__.py", line 143, in Client
    api = system.instance(API)
  File "/usr/local/lib/python3.10/dist-packages/chromadb/config.py", line 195, in instance
    impl = type(self)
  File "/usr/local/lib/python3.10/dist-packages/chromadb/api/segment.py", line 82, in __init__
    self._manager = self.require(SegmentManager)
  File "/usr/local/lib/python3.10/dist-packages/chromadb/config.py", line 134, in require
    inst = self._system.instance(type)
  File "/usr/local/lib/python3.10/dist-packages/chromadb/config.py", line 195, in instance
    impl = type(self)
  File "/usr/local/lib/python3.10/dist-packages/chromadb/segment/impl/manager/local.py", line 73, in __init__
    // PersistentLocalHnswSegment.get_file_handle_count()
  File "/usr/local/lib/python3.10/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 398, in get_file_handle_count
    hnswlib_count = hnswlib.Index.file_handle_count
AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'

In constants.py file I set these constants:

EMBEDDING_MODEL_NAME = "hkunlp/instructor-xl"
MODEL_ID = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
MODEL_BASENAME = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"

My current configuration is:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.04              Driver Version: 536.23       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:65:00.0  On |                  Off |
|  0%   36C    P8              15W / 450W |   1863MiB / 24564MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

The pip packages are in line with the requirements.txt list:

# Natural Language Processing
langchain==0.0.267
chromadb==0.4.6
pdfminer.six==20221105
InstructorEmbedding
sentence-transformers
faiss-cpu
huggingface_hub
transformers
protobuf==3.20.2; sys_platform != 'darwin'
protobuf==3.20.2; sys_platform == 'darwin' and platform_machine != 'arm64'
protobuf==3.20.3; sys_platform == 'darwin' and platform_machine == 'arm64'
auto-gptq==0.2.2
docx2txt
unstructured
unstructured[pdf]

# Utilities
urllib3==1.26.6
accelerate
bitsandbytes ; sys_platform != 'win32'
bitsandbytes-windows ; sys_platform == 'win32'
click
flask
requests

# Streamlit related
streamlit
Streamlit-extras

# Excel File Manipulation
openpyxl

paxvinci avatar Nov 15 '23 15:11 paxvinci

@paxvinci did you get any update on your issue?

bp020108 avatar Feb 07 '24 16:02 bp020108

@bp020108 no. I tried another approach. I'm making localGPT from scratch.

paxvinci avatar Feb 08 '24 09:02 paxvinci