IndexError: index (2864) out of range when not re-creating index or restarting webapp after config change
I indexed some .cpp files as described in https://github.com/snexus/llm-search/issues/90#issuecomment-2920852058, adding a - doc_path: ... entry, and running llmsearch index update, but without restarting the llmsearch interact webapp ....
When I then query something via the web UI, I get:
2025-05-30 00:18:38.969 | DEBUG | __main__:<module>:246 - CONFIG FILE: /home/ubuntu/llm-search/configs/niklas-config-1.yaml
2025-05-30 00:18:38.975 | DEBUG | llmsearch.ranking:get_relevant_documents:105 - Evaluating query: What's the name of the API endpoint that generates thumbnails?
2025-05-30 00:18:38.975 | INFO | llmsearch.ranking:get_relevant_documents:107 - Adding query prefix for retrieval: query:
2025-05-30 00:18:38.975 | INFO | llmsearch.splade:query:248 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 2865
────────────────────────── Traceback (most recent call last) ───────────────────────────
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_
code.py:121 in exec_func_with_error_handling
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/scrip
t_runner.py:645 in code_to_exec
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:342 in <module>
339 │ │ │ │ conv_history_rewrite_query
340 │ │ │ )
341 │ │
❱ 342 │ │ output = generate_response(
343 │ │ │ question=text,
344 │ │ │ use_hyde=st.session_state["llm_bundle"].hyde_enabled,
345 │ │ │ use_multiquery=st.session_state["llm_bundle"].multiquery_enabled,
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util
s.py:219 in __call__
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util
s.py:261 in _get_or_create_cached_value
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util
s.py:320 in _handle_cache_miss
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:175 in
generate_response
172 ):
173 │ # _config and _bundle are under scored so paratemeters aren't hashed
174 │
❱ 175 │ output = get_and_parse_response(
176 │ │ query=question, config=_config, llm_bundle=_bundle, label=label_filter
177 │ )
178 │ return output
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/process.py:66 in
get_and_parse_response
63 │ │ offset_max_chars = 0
64 │
65 │ semantic_search_config = config.semantic_search
❱ 66 │ most_relevant_docs, score = get_relevant_documents(
67 │ │ original_query, queries, llm_bundle, semantic_search_config, label=lab
68 │ │ offset_max_chars = offset_max_chars
69 │ )
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/ranking.py:109 in
get_relevant_documents
106 │ │ │ if config.query_prefix:
107 │ │ │ │ logger.info(f"Adding query prefix for retrieval: {config.query
108 │ │ │ │ query = config.query_prefix + query
❱ 109 │ │ │ sparse_search_docs_ids, sparse_scores = sparse_retriever.query(
110 │ │ │ │ search=query, n=config.max_k, label=label, chunk_size=chunk_si
111 │ │ │ )
112
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/splade.py:253 in query
250 │ │ │ )
251 │ │
252 │ │ # print(indices)
❱ 253 │ │ embeddings = self._embeddings[indices] # type: ignore
254 │ │ ids = self._ids[indices] # type: ignore
255 │ │ l2_norm_matrix = scipy.sparse.linalg.norm(embeddings, axis=1)
256
/home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:30 in
__getitem__
27 │ This class provides common dispatching and validation logic for indexing.
28 │ """
29 │ def __getitem__(self, key):
❱ 30 │ │ index, new_shape = self._validate_indices(key)
31 │ │
32 │ │ # 1D array
33 │ │ if len(index) == 1:
/home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:288 in
_validate_indices
285 │ │ │ │ index_ndim = tmp_ndim
286 │ │ │ else: # dense array
287 │ │ │ │ N = self._shape[index_ndim]
❱ 288 │ │ │ │ idx = self._asindices(idx, N)
289 │ │ │ │ index.append(idx)
290 │ │ │ │ array_indices.append(index_ndim)
291 │ │ │ │ index_ndim += 1
/home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:332 in
_asindices
329 │ │ # Check bounds
330 │ │ max_indx = x.max()
331 │ │ if max_indx >= length:
❱ 332 │ │ │ raise IndexError('index (%d) out of range' % max_indx)
333 │ │
334 │ │ min_indx = x.min()
335 │ │ if min_indx < 0:
────────────────────────────────────────────────────────────────────────────────────────
IndexError: index (2864) out of range
It seems fixed when I restart llmsearch interact webapp, AND run llmsearch index create ... instead of llmsearch index update ...
Is that expected?
If yes, it would be nice to get a better error than IndexError, to tell me that I have to restart the whole webapp after changing the config.
But then again, if I add another entry for another programming language, the IndexError persists.
Updating or creating an index via CLI and web interface are independent operations, i.e. when index is updated from cli, the web ui is not aware of it.
You can update the index directly via webui, like shown below:
Let me know if it works for you
@snexus I still get this error even when I completely restart the webapp process and refresh the browser GUI:
But then again, if I add another entry for another programming language, the
IndexErrorpersists.
Just to understand better how to reproduce:
- You are creating an index with specifc doc_path configured
- You are adding a new doc path in the config and running "llmsearch index update ..." and it fails?
Think I didn't test it in the above scenario - the assumption is the doc paths are static, but documents within the configured paths can be updated / removed. Agree it is not an intuitive user experience.
For now, of you add a new path, you should recreate the index. I will try to fix it in the near future.
@snexus I observed the following:
- Create index with 1
doc_pathconfigured, use the software as usual - Add another
doc_pathwithscan_extensions: [cpp]in the config and runllmsearch index update ...without restarting webapp; observeIndexErrorin the webapp GUI - Restart
webapp,IndexErrorgoes away - Add another
doc_pathwithscan_extensions: [hs]in the config and runllmsearch index update ...without restarting webapp; observeIndexErrorin the webapp GUI; this time, restartingwebappdoes not help make it go away