localGPT
localGPT copied to clipboard
AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'
Hi all, I'm trying to ingest my documents but after the loading of all the documents (17689 files) I receive the error in the title. This is the tail of the log:
2023-11-15 16:23:59,796 - INFO - ingest.py:153 - Loaded 17689 documents from /home/<my home>/dev/localGPT/SOURCE_DOCUMENTS
2023-11-15 16:23:59,797 - INFO - ingest.py:154 - Split into 137095 chunks of text
2023-11-15 16:24:01,045 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-xl
load INSTRUCTOR_Transformer
2023-11-15 16:24:01,351 - INFO - instantiator.py:21 - Created a temporary directory at /tmp/tmpmyq875y7
2023-11-15 16:24:01,352 - INFO - instantiator.py:76 - Writing /tmp/tmpmyq875y7/_remote_module_non_scriptable.py
max_seq_length 512
Traceback (most recent call last):
File "/home/<my home>/dev/localGPT/ingest.py", line 181, in <module>
main()
File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/<my home>/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/<my home>/dev/localGPT/ingest.py", line 168, in main
db = Chroma.from_documents(
File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 613, in from_documents
return cls.from_texts(
File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 568, in from_texts
chroma_collection = cls(
File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 120, in __init__
self._client = chromadb.Client(_client_settings)
File "/usr/local/lib/python3.10/dist-packages/chromadb/__init__.py", line 143, in Client
api = system.instance(API)
File "/usr/local/lib/python3.10/dist-packages/chromadb/config.py", line 195, in instance
impl = type(self)
File "/usr/local/lib/python3.10/dist-packages/chromadb/api/segment.py", line 82, in __init__
self._manager = self.require(SegmentManager)
File "/usr/local/lib/python3.10/dist-packages/chromadb/config.py", line 134, in require
inst = self._system.instance(type)
File "/usr/local/lib/python3.10/dist-packages/chromadb/config.py", line 195, in instance
impl = type(self)
File "/usr/local/lib/python3.10/dist-packages/chromadb/segment/impl/manager/local.py", line 73, in __init__
// PersistentLocalHnswSegment.get_file_handle_count()
File "/usr/local/lib/python3.10/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 398, in get_file_handle_count
hnswlib_count = hnswlib.Index.file_handle_count
AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'
In constants.py file I set these constants:
EMBEDDING_MODEL_NAME = "hkunlp/instructor-xl"
MODEL_ID = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
MODEL_BASENAME = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"
My current configuration is:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.04 Driver Version: 536.23 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:65:00.0 On | Off |
| 0% 36C P8 15W / 450W | 1863MiB / 24564MiB | 5% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
The pip packages are in line with the requirements.txt list:
# Natural Language Processing
langchain==0.0.267
chromadb==0.4.6
pdfminer.six==20221105
InstructorEmbedding
sentence-transformers
faiss-cpu
huggingface_hub
transformers
protobuf==3.20.2; sys_platform != 'darwin'
protobuf==3.20.2; sys_platform == 'darwin' and platform_machine != 'arm64'
protobuf==3.20.3; sys_platform == 'darwin' and platform_machine == 'arm64'
auto-gptq==0.2.2
docx2txt
unstructured
unstructured[pdf]
# Utilities
urllib3==1.26.6
accelerate
bitsandbytes ; sys_platform != 'win32'
bitsandbytes-windows ; sys_platform == 'win32'
click
flask
requests
# Streamlit related
streamlit
Streamlit-extras
# Excel File Manipulation
openpyxl
@paxvinci did you get any update on your issue?
@bp020108 no. I tried another approach. I'm making localGPT from scratch.