quivr Debugging a critical failure

Before building quivr out further, I would like to look into this problem: https://twitter.com/bitfalls/status/1657736877007405056?s=20

Specifically, I fed it around 30 markdown documents, but it looks like it does not really follow context and has no real idea what is going on. Every article has a "status" at the top, and when I asked it to give me all active ones, it just gives me random ones, while asking for the status of one I know is active it tells me it is active. This shows me that it can check for status of a targeted ask, but cannot reason on the whole corpus of information. I wonder why this happens and if we can debug it, because unless this is resolved, Quivr is dangerously unreliable to use.

@StanGirard any ideas on what to do to test this and come to a satisfactory output?

May 17 '23 04:05 Swader

I guess there are two potential issues

related to the prompt, which somehow it wasn't able to locate all correct embeddings of docs for that prompt.
related to document split, the document truck doesn't have all the information it requires to answer the question.

I think one thing we could do is have a debug mode, which shows what documents have been included and what have been included to answer the question. It would be super helpful to debug :D

May 17 '23 05:05 Shaunwei

@Swader thx for submitting the issue 👍

May 17 '23 05:05 Shaunwei

Hey @Swader, Thanks for the issue.

The issue might come from an omission on my part. When I created Quivr I didn't update the default settings of the VectorStore retriever.

What it means ? If you have 20 files with the information it might only get the top 5.

We need to implement a way of increasing the retrievers capabilities but we face some issues:

The users can choose its chunk size meaning if we set the retriever to 100 chunks we can go over the max token number
The default chunk size is probably suboptimal for a second brain.

More info on retrievers here https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/vectorstore-retriever.html

May 17 '23 07:05 StanGirard

I see. So what is the best approach to handle debugging this? I am hesitant to integrate Quivr any more deeply into my workflow until I can be sure it is aware of the whole context. Would it help if I generated a mock BD database similar to mine, but with fake companies and statuses, which we could evaluate things on?

May 17 '23 07:05 Swader

We need to implement a way of increasing the retrievers capabilities but we face some issues:

The users can choose its chunk size meaning if we set the retriever to 100 chunks we can go over the max token number

The default chunk size is probably suboptimal for a second brain.

How about solving this with Whisper.cpp or LLaMA.cpp models?

May 19 '23 15:05 mi-hol

@StanGirard how do we re-test for full context awareness?

Jun 01 '23 13:06 Swader

It has been another month @StanGirard let's get this fully end to end tested?

Jul 05 '23 10:07 Swader

We have started implemented tests with Pytest and Vitest for Frontend and backend :)

Feel free to add your own tests

Jul 05 '23 10:07 StanGirard

How can I use tests to make sure it knows the knowledge I expect it to know? Any simple guide? Would love to contribute with cases.

Jul 14 '23 04:07 Swader

Thanks for your contributions, we'll be closing this issue as it has gone stale. Feel free to reopen if you'd like to continue the discussion.

Aug 22 '23 16:08 github-actions[bot]

It is stale, but remains unresolved.

Aug 22 '23 16:08 Swader

Too many things to do 😁

Aug 22 '23 21:08 StanGirard

Seem critical though? End to end tests (generate an example DB using chatgpt then run E2E tests on it that have to pass on every release) could be pretty valuable, no? Seems like this is failing at the core functionality of quivr, so I feel like it should be addressed and want to help, esp. now that OpenAI is also pushing trainable data.

Aug 23 '23 10:08 Swader

You could try using langsmith. They have some pretty cool features.

On Wed, Aug 23 2023 at 12:01 PM, Bruno Škvorc < @.*** > wrote:

Seem critical though? End to end tests (generate an example DB using chatgpt then run E2E tests on it that have to pass on every release) could be pretty valuable, no? Seems like this is failing at the core functionality of quivr, so I feel like it should be addressed and want to help, esp. now that OpenAI is also pushing trainable data.

— Reply to this email directly, view it on GitHub ( https://github.com/StanGirard/quivr/issues/41#issuecomment-1689666781 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AEVUW3HF6N7ESNLGBDEH2NLXWXIHVANCNFSM6AAAAAAYEQUNAY ). You are receiving this because you were mentioned. Message ID: <StanGirard/quivr/issues/41/1689666781 @ github. com>

Aug 23 '23 20:08 StanGirard

For what exactly, what do you mean? For the testing? I do not think it is necessary, I think just generating a mock DB that we can test on, and then run simple JS tests for input/output would be quite adequate.

Aug 26 '23 06:08 Swader

HI @Swader, thanks for the suggestion & your interest in Quivr. We're working on a UX change and a backend refactoring at the moment, so we haven't been able to prioritize e2e tests. Feel free to create an issue and a PR for the e2e tests, we will follow it closely, it would help tremendously !!

Aug 29 '23 12:08 gozineb

Thanks for your contributions, we'll be closing this issue as it has gone stale. Feel free to reopen if you'd like to continue the discussion.

Sep 28 '23 16:09 github-actions[bot]

quivr quivr copied to clipboard

Debugging a critical failure

quivr
quivr copied to clipboard