llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

GPTChromaIndex indexing fails when documents contains an empty file

Open timonmat opened this issue 1 year ago • 1 comments

trying to index a folder containing an empty document fails when using GPTChromaIndex. reproduced easily by creating an empty doc in a folder of files, and at least by using simpledirectoryreader.

2023-03-21 12:14:31.215 Uncaught app exception Traceback (most recent call last): File "/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script exec(code, module.dict) File "/github/ChatObsidian/pages/Index_to_chroma.py", line 106, in build_chroma_index(documents, collection, reindex) File "/github/ChatObsidian/utils/chroma.py", line 52, in build_chroma_index index = GPTChromaIndex(documents, chroma_collection=_chroma_collection, embed_model=embed_model, prompt_helper=prompt_helper) File "/lib/python3.10/site-packages/llama_index/indices/vector_store/vector_indices.py", line 500, in init super().init( File "/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 63, in init super().init( File "/lib/python3.10/site-packages/llama_index/indices/base.py", line 114, in init self._index_struct = self.build_index_from_documents(documents) File "/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py", line 86, in wrapped_llm_predict f_return_val = f(_self, *args, **kwargs) File "/lib/python3.10/site-packages/llama_index/indices/base.py", line 286, in build_index_from_documents return self._build_index_from_documents(documents) File "/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 206, in _build_index_from_documents self._add_document_to_index(index_struct, d) File "/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 186, in _add_document_to_index new_ids = self._vector_store.add(embedding_results) File "/lib/python3.10/site-packages/llama_index/vector_stores/chroma.py", line 71, in add self._collection.add( File "/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 79, in add ids = validate_ids(maybe_cast_one_to_many(ids)) File "/lib/python3.10/site-packages/chromadb/api/types.py", line 71, in maybe_cast_one_to_many if isinstance(target[0], (int, float)): IndexError: list index out of range

Seems more like a Chroma issue really. but I guess a workaround could be made either client side or on llama index.

timonmat avatar Mar 21 '23 10:03 timonmat

Should be a quick fix!

Disiok avatar Mar 21 '23 17:03 Disiok

This is fixed in the latest versions of llama-index

>>> from llama_index import GPTVectorStoreIndex, Document
>>> doc = Document('')
>>> index = GPTVectorStoreIndex.from_documents([doc])
>>> 
>>> from llama_index import GPTVectorStoreIndex, Document
>>> index = GPTVectorStoreIndex.from_documents([])
>>> 

logan-markewich avatar Jun 06 '23 02:06 logan-markewich