quivr icon indicating copy to clipboard operation
quivr copied to clipboard

Debugging a critical failure

Open Swader opened this issue 2 years ago • 9 comments

Before building quivr out further, I would like to look into this problem: https://twitter.com/bitfalls/status/1657736877007405056?s=20

Specifically, I fed it around 30 markdown documents, but it looks like it does not really follow context and has no real idea what is going on. Every article has a "status" at the top, and when I asked it to give me all active ones, it just gives me random ones, while asking for the status of one I know is active it tells me it is active. This shows me that it can check for status of a targeted ask, but cannot reason on the whole corpus of information. I wonder why this happens and if we can debug it, because unless this is resolved, Quivr is dangerously unreliable to use.

@StanGirard any ideas on what to do to test this and come to a satisfactory output?

Swader avatar May 17 '23 04:05 Swader

I guess there are two potential issues

  1. related to the prompt, which somehow it wasn't able to locate all correct embeddings of docs for that prompt.
  2. related to document split, the document truck doesn't have all the information it requires to answer the question.

I think one thing we could do is have a debug mode, which shows what documents have been included and what have been included to answer the question. It would be super helpful to debug :D

Shaunwei avatar May 17 '23 05:05 Shaunwei

@Swader thx for submitting the issue 👍

Shaunwei avatar May 17 '23 05:05 Shaunwei

Hey @Swader, Thanks for the issue.

The issue might come from an omission on my part. When I created Quivr I didn't update the default settings of the VectorStore retriever.

What it means ? If you have 20 files with the information it might only get the top 5.

We need to implement a way of increasing the retrievers capabilities but we face some issues:

  • The users can choose its chunk size meaning if we set the retriever to 100 chunks we can go over the max token number
  • The default chunk size is probably suboptimal for a second brain.

More info on retrievers here https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/vectorstore-retriever.html

StanGirard avatar May 17 '23 07:05 StanGirard

I see. So what is the best approach to handle debugging this? I am hesitant to integrate Quivr any more deeply into my workflow until I can be sure it is aware of the whole context. Would it help if I generated a mock BD database similar to mine, but with fake companies and statuses, which we could evaluate things on?

Swader avatar May 17 '23 07:05 Swader

We need to implement a way of increasing the retrievers capabilities but we face some issues:

  • The users can choose its chunk size meaning if we set the retriever to 100 chunks we can go over the max token number
  • The default chunk size is probably suboptimal for a second brain.

How about solving this with Whisper.cpp or LLaMA.cpp models?

mi-hol avatar May 19 '23 15:05 mi-hol

@StanGirard how do we re-test for full context awareness?

Swader avatar Jun 01 '23 13:06 Swader

It has been another month @StanGirard let's get this fully end to end tested?

Swader avatar Jul 05 '23 10:07 Swader

We have started implemented tests with Pytest and Vitest for Frontend and backend :)

Feel free to add your own tests

StanGirard avatar Jul 05 '23 10:07 StanGirard

How can I use tests to make sure it knows the knowledge I expect it to know? Any simple guide? Would love to contribute with cases.

Swader avatar Jul 14 '23 04:07 Swader

Thanks for your contributions, we'll be closing this issue as it has gone stale. Feel free to reopen if you'd like to continue the discussion.

github-actions[bot] avatar Aug 22 '23 16:08 github-actions[bot]

It is stale, but remains unresolved.

Swader avatar Aug 22 '23 16:08 Swader

Too many things to do 😁

StanGirard avatar Aug 22 '23 21:08 StanGirard

Seem critical though? End to end tests (generate an example DB using chatgpt then run E2E tests on it that have to pass on every release) could be pretty valuable, no? Seems like this is failing at the core functionality of quivr, so I feel like it should be addressed and want to help, esp. now that OpenAI is also pushing trainable data.

Swader avatar Aug 23 '23 10:08 Swader

You could try using langsmith. They have some pretty cool features.

On Wed, Aug 23 2023 at 12:01 PM, Bruno Škvorc < @.*** > wrote:

Seem critical though? End to end tests (generate an example DB using chatgpt then run E2E tests on it that have to pass on every release) could be pretty valuable, no? Seems like this is failing at the core functionality of quivr, so I feel like it should be addressed and want to help, esp. now that OpenAI is also pushing trainable data.

— Reply to this email directly, view it on GitHub ( https://github.com/StanGirard/quivr/issues/41#issuecomment-1689666781 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AEVUW3HF6N7ESNLGBDEH2NLXWXIHVANCNFSM6AAAAAAYEQUNAY ). You are receiving this because you were mentioned. Message ID: <StanGirard/quivr/issues/41/1689666781 @ github. com>

StanGirard avatar Aug 23 '23 20:08 StanGirard

For what exactly, what do you mean? For the testing? I do not think it is necessary, I think just generating a mock DB that we can test on, and then run simple JS tests for input/output would be quite adequate.

Swader avatar Aug 26 '23 06:08 Swader

HI @Swader, thanks for the suggestion & your interest in Quivr. We're working on a UX change and a backend refactoring at the moment, so we haven't been able to prioritize e2e tests. Feel free to create an issue and a PR for the e2e tests, we will follow it closely, it would help tremendously !!

gozineb avatar Aug 29 '23 12:08 gozineb

Thanks for your contributions, we'll be closing this issue as it has gone stale. Feel free to reopen if you'd like to continue the discussion.

github-actions[bot] avatar Sep 28 '23 16:09 github-actions[bot]