local-rag icon indicating copy to clipboard operation
local-rag copied to clipboard

Streamlit cache leading to empty index

Open tinamil opened this issue 1 year ago • 3 comments

I deployed with docker and uploaded a local file. When I tried to chat I got a blank response and the following error: local-rag | 2024-05-02 12:28:54,674 - ollama - ERROR - Ollama chat stream error: 'HuggingFaceEmbedding' object has no attribute '_model'

However, it works normally when I upload a website instead.

I traced the problem to line 114 of utils/llama_index.py: @st.cache_data(show_spinner=False) for the create_index(_documents) function. So, one workaround is to comment out that line, and then local files work again. I believe create_index is being called when the documents are being uploaded, but before they have been saved to disk then read into memory, so the index is empty and then streamlit is caching the result instead of regenerating the document index when the query comes through.

tinamil avatar May 02 '24 16:05 tinamil

Cache was added to help speed up subsequent chat messages since Streamlit essentially runs the full app each time, triggering embedding again. It certainly has edge cases and almost creates more problems than it solves.

This is sort of outlined in the Known Issues section.

I'll try to take a second look at this and see if there's a better option.

jonfairbanks avatar May 03 '24 04:05 jonfairbanks

I found some additional information. Streamlit @st.cache.data excludes parameter names that begin with an underscore from being hashed. https://docs.streamlit.io/develop/concepts/architecture/caching#excluding-input-parameters

So, that is most likely why the create_index function is not detecting that the documents have changed.

I attempted to change line 115 to def create_index(documents): and line 134 to documents=documents, show_progress=True, i.e. I removed the underscore from the parameter. However, that caused more errors, which is likely why the underscore existed in the first place:

llama_index - ERROR - Error when creating Query Engine: Cannot hash argument 'documents' (of type builtins.list) in 'create_index'.
To address this, you can tell Streamlit not to hash this argument by adding a
leading underscore to the argument's name in the function signature:

@st.cache_data
def create_index(_documents, ...):
    ...

tinamil avatar May 03 '24 17:05 tinamil

I think this is fixed in #54

JoepdeJong avatar May 21 '24 09:05 JoepdeJong