llm-search IndexError: index (2864) out of range when not re-creating index or restarting webapp after config change

I indexed some .cpp files as described in https://github.com/snexus/llm-search/issues/90#issuecomment-2920852058, adding a - doc_path: ... entry, and running llmsearch index update, but without restarting the llmsearch interact webapp ....

When I then query something via the web UI, I get:

2025-05-30 00:18:38.969 | DEBUG    | __main__:<module>:246 - CONFIG FILE: /home/ubuntu/llm-search/configs/niklas-config-1.yaml
2025-05-30 00:18:38.975 | DEBUG    | llmsearch.ranking:get_relevant_documents:105 - Evaluating query: What's the name of the API endpoint that generates thumbnails?
2025-05-30 00:18:38.975 | INFO     | llmsearch.ranking:get_relevant_documents:107 - Adding query prefix for retrieval: query: 
2025-05-30 00:18:38.975 | INFO     | llmsearch.splade:query:248 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 2865
────────────────────────── Traceback (most recent call last) ───────────────────────────
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_  
  code.py:121 in exec_func_with_error_handling                                          
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/scrip  
  t_runner.py:645 in code_to_exec                                                       
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:342 in <module>   
                                                                                        
    339 │   │   │   │   conv_history_rewrite_query                                      
    340 │   │   │   )                                                                   
    341 │   │                                                                           
  ❱ 342 │   │   output = generate_response(                                             
    343 │   │   │   question=text,                                                      
    344 │   │   │   use_hyde=st.session_state["llm_bundle"].hyde_enabled,               
    345 │   │   │   use_multiquery=st.session_state["llm_bundle"].multiquery_enabled,   
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util  
  s.py:219 in __call__                                                                  
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util  
  s.py:261 in _get_or_create_cached_value                                               
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util  
  s.py:320 in _handle_cache_miss                                                        
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:175 in            
  generate_response                                                                     
                                                                                        
    172 ):                                                                              
    173 │   # _config and _bundle are under scored so paratemeters aren't hashed        
    174 │                                                                               
  ❱ 175 │   output = get_and_parse_response(                                            
    176 │   │   query=question, config=_config, llm_bundle=_bundle, label=label_filter  
    177 │   )                                                                           
    178 │   return output                                                               
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/process.py:66 in            
  get_and_parse_response                                                                
                                                                                        
     63 │   │   offset_max_chars = 0                                                    
     64 │                                                                               
     65 │   semantic_search_config = config.semantic_search                             
  ❱  66 │   most_relevant_docs, score = get_relevant_documents(                         
     67 │   │   original_query, queries, llm_bundle, semantic_search_config, label=lab  
     68 │   │   offset_max_chars = offset_max_chars                                     
     69 │   )                                                                           
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/ranking.py:109 in           
  get_relevant_documents                                                                
                                                                                        
    106 │   │   │   if config.query_prefix:                                             
    107 │   │   │   │   logger.info(f"Adding query prefix for retrieval: {config.query  
    108 │   │   │   │   query = config.query_prefix + query                             
  ❱ 109 │   │   │   sparse_search_docs_ids, sparse_scores = sparse_retriever.query(     
    110 │   │   │   │   search=query, n=config.max_k, label=label, chunk_size=chunk_si  
    111 │   │   │   )                                                                   
    112                                                                                 
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/splade.py:253 in query      
                                                                                        
    250 │   │   │   )                                                                   
    251 │   │                                                                           
    252 │   │   # print(indices)                                                        
  ❱ 253 │   │   embeddings = self._embeddings[indices]  # type: ignore                  
    254 │   │   ids = self._ids[indices]  # type: ignore                                
    255 │   │   l2_norm_matrix = scipy.sparse.linalg.norm(embeddings, axis=1)           
    256                                                                                 
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:30 in          
  __getitem__                                                                           
                                                                                        
     27 │   This class provides common dispatching and validation logic for indexing.   
     28 │   """                                                                         
     29 │   def __getitem__(self, key):                                                 
  ❱  30 │   │   index, new_shape = self._validate_indices(key)                          
     31 │   │                                                                           
     32 │   │   # 1D array                                                              
     33 │   │   if len(index) == 1:                                                     
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:288 in         
  _validate_indices                                                                     
                                                                                        
    285 │   │   │   │   index_ndim = tmp_ndim                                           
    286 │   │   │   else:  # dense array                                                
    287 │   │   │   │   N = self._shape[index_ndim]                                     
  ❱ 288 │   │   │   │   idx = self._asindices(idx, N)                                   
    289 │   │   │   │   index.append(idx)                                               
    290 │   │   │   │   array_indices.append(index_ndim)                                
    291 │   │   │   │   index_ndim += 1                                                 
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:332 in         
  _asindices                                                                            
                                                                                        
    329 │   │   # Check bounds                                                          
    330 │   │   max_indx = x.max()                                                      
    331 │   │   if max_indx >= length:                                                  
  ❱ 332 │   │   │   raise IndexError('index (%d) out of range' % max_indx)              
    333 │   │                                                                           
    334 │   │   min_indx = x.min()                                                      
    335 │   │   if min_indx < 0:                                                        
────────────────────────────────────────────────────────────────────────────────────────
IndexError: index (2864) out of range

It seems fixed when I restart llmsearch interact webapp, AND run llmsearch index create ... instead of llmsearch index update ...

Is that expected?

If yes, it would be nice to get a better error than IndexError, to tell me that I have to restart the whole webapp after changing the config.

But then again, if I add another entry for another programming language, the IndexError persists.

May 30 '25 00:05 nh2

Updating or creating an index via CLI and web interface are independent operations, i.e. when index is updated from cli, the web ui is not aware of it.

You can update the index directly via webui, like shown below:

Let me know if it works for you

May 31 '25 01:05 snexus

@snexus I still get this error even when I completely restart the webapp process and refresh the browser GUI:

But then again, if I add another entry for another programming language, the IndexError persists.

May 31 '25 12:05 nh2

Just to understand better how to reproduce:

You are creating an index with specifc doc_path configured
You are adding a new doc path in the config and running "llmsearch index update ..." and it fails?

Think I didn't test it in the above scenario - the assumption is the doc paths are static, but documents within the configured paths can be updated / removed. Agree it is not an intuitive user experience.

For now, of you add a new path, you should recreate the index. I will try to fix it in the near future.

May 31 '25 12:05 snexus

@snexus I observed the following:

Create index with 1 doc_path configured, use the software as usual
Add another doc_path with scan_extensions: [cpp] in the config and run llmsearch index update ... without restarting webapp; observe IndexError in the webapp GUI
Restart webapp, IndexError goes away
Add another doc_path with scan_extensions: [hs] in the config and run llmsearch index update ... without restarting webapp; observe IndexError in the webapp GUI; this time, restarting webapp does not help make it go away

May 31 '25 19:05 nh2