DocsGPT icon indicating copy to clipboard operation
DocsGPT copied to clipboard

🐛 Bug Report: Debugging middleware caught exception in streamed response at a point where response headers were already sent

Open ansionfor opened this issue 2 years ago • 11 comments

📜 Description

截屏2023-10-10 23 11 13 截屏2023-10-10 23 10 48

👟 Reproduction steps

./setup.sh

👍 Expected behavior

succ

👎 Actual Behavior with Screenshots

same

💻 Operating system

MacOS

What browsers are you seeing the problem on?

Chrome

🤖 What development environment are you experiencing this bug on?

Docker

🔒 Did you set the correct environment variables in the right path? List the environment variable names (not values please!)

No response

📃 Provide any additional context for the Bug.

No response

📖 Relevant log output

No response

👀 Have you spent some time to check if this bug has been raised before?

  • [X] I checked and didn't find similar issue

🔗 Are you willing to submit PR?

None

🧑‍⚖️ Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

ansionfor avatar Oct 10 '23 15:10 ansionfor

I encountered the same issue. I have created the venv and run the ./setup.sh inside the venv, every service runs successfully on docker, but when submitting a query the error happens. I use python 3.10.

mehdimo avatar Oct 11 '23 00:10 mehdimo

Did you choose option 1 or 2? please note it might take a while for a response too.

dartpain avatar Oct 11 '23 17:10 dartpain

I used option 1.

mehdimo avatar Oct 11 '23 20:10 mehdimo

Any error trace in your console?

dartpain avatar Oct 11 '23 21:10 dartpain

Here is the error trace:

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 161.88 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Debugging middleware caught exception in streamed response at a point where response headers were already sent.
Traceback (most recent call last):
  File "/Users/user1/Documents/Github/DocsGPT/venv/lib/python3.11/site-packages/werkzeug/wsgi.py", line 256, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "/Users/user1/Documents/Github/DocsGPT/venv/lib/python3.11/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
    for item in iterable:
  File "/Users/user1/Documents/Github/DocsGPT/application/api/answer/routes.py", line 120, in complete_stream
    docs = docsearch.search(question, k=2)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/Documents/Github/DocsGPT/application/vectorstore/faiss.py", line 20, in search
    return self.docsearch.similarity_search(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/Documents/Github/DocsGPT/venv/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 334, in similarity_search
    docs_and_scores = self.similarity_search_with_score(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/Documents/Github/DocsGPT/venv/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 276, in similarity_search_with_score
    docs = self.similarity_search_with_score_by_vector(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/Documents/Github/DocsGPT/venv/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 219, in similarity_search_with_score_by_vector
    scores, indices = self.index.search(vector, k if filter is None else fetch_k)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/Documents/Github/DocsGPT/venv/lib/python3.11/site-packages/faiss/class_wrappers.py", line 329, in replacement_search
    assert d == self.d
           ^^^^^^^^^^^
AssertionError
127.0.0.1 - - [11/Oct/2023 14:54:13] "POST /stream HTTP/1.1" 200 -

mehdimo avatar Oct 11 '23 21:10 mehdimo

Encountered the same issue on MacOs, when running ./setup.sh (with python in venv) -> Option 1. Got this stack trace:

...................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 161.88 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
Shape of query vector: (1, 768)
Query vector dimension: 768
Faiss index dimension: 1536
Debugging middleware caught exception in streamed response at a point where response headers were already sent.
Traceback (most recent call last):
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/werkzeug/wsgi.py", line 256, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
    for item in iterable:
  File "/Users/axel/repos/DocsGPT/application/api/answer/routes.py", line 120, in complete_stream
    docs = docsearch.search(question, k=2)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/application/vectorstore/faiss.py", line 20, in search
    return self.docsearch.similarity_search(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 334, in similarity_search
    docs_and_scores = self.similarity_search_with_score(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 276, in similarity_search_with_score
    docs = self.similarity_search_with_score_by_vector(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 219, in similarity_search_with_score_by_vector
    scores, indices = self.index.search(vector, k if filter is None else fetch_k)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/faiss/class_wrappers.py", line 333, in replacement_search
    assert d == self.d
           ^^^^^^^^^^^
AssertionError
127.0.0.1 - - [12/Oct/2023 12:54:06] "POST /stream HTTP/1.1" 200 -

When adding some print statements I got these values of d and self.d:

Shape of x: (1, 768)
d: 768
self.d: 1536

And when I just bypassed the assert d == self.d I got this downstream error:

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 161.88 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
docs:
[]
question:
Hello
Debugging middleware caught exception in streamed response at a point where response headers were already sent.
Traceback (most recent call last):
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/werkzeug/wsgi.py", line 256, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "/Users/axel/repos/DocsGPT/venv/lib/python3.11/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
    for item in iterable:
  File "/Users/axel/repos/DocsGPT/application/api/answer/routes.py", line 126, in complete_stream
    docs = [docs[0]]
            ^^^^^^^^
IndexError: list index out of range

asoderlind avatar Oct 12 '23 10:10 asoderlind

Ahhh, please try ingesting your own docuemnts and the asking question about it Basically the pre loaded index will not work, just upload any pdf or doc

dartpain avatar Oct 12 '23 11:10 dartpain

It worked when I uploaded a document with the assertion commented out.

However, commenting on the original bug. From what I have gathered so far (and please correct me if I'm wrong): The embedding function of the docsgpt-7b-f16.gguf model outputs embedding vectors of length 768, which corresponds to the d in the assert d == self.d. However the FAISS vector store seems to be initiated with an index (corresponding to self) that assumes the vector length 1536 for some reason. I haven't quite figured out where this initialization is and what kind of index is passed though.

More than willing to make a PR in case we get to the bottom of this (assuming it doesn't get solved before that)

asoderlind avatar Oct 12 '23 16:10 asoderlind

Update: I think the problem is that in this case we are loading the embedding function from this transformer: huggingface_sentence-transformers/all-mpnet-base-v2 which has embedding length of 768, but the index which is loaded into the FAISS vector store is loaded from the file application/index.faiss which gives us an expected embedding length of 1536.

So basically here:

class FaissStore(BaseVectorStore):

    def __init__(self, path, embeddings_key, docs_init=None):
        super().__init__()
        self.path = path
        if docs_init:
            self.docsearch = FAISS.from_documents(
                docs_init, self._get_embeddings(settings.EMBEDDINGS_NAME, embeddings_key)
            )
        else:
            self.docsearch = FAISS.load_local(
                self.path, self._get_embeddings(settings.EMBEDDINGS_NAME, settings.EMBEDDINGS_KEY)
            )

when docs_init is False we have the path application/ which leads us to application/index.faiss (d=1536) but the EMBEDDINGS_NAME is huggingface_sentence-transformers/all-mpnet-base-v2 which gives us vectors of length 768, hence the mismatch.

asoderlind avatar Oct 12 '23 17:10 asoderlind

I guess OpenAPI:s embeddings have length 1536, so that's why it would work when using the OpenAI API maybe?

asoderlind avatar Oct 12 '23 17:10 asoderlind

Thank you for the PR!

dartpain avatar Oct 14 '23 19:10 dartpain