llama_index docs: detailed explanations about query_mode and response

Hi,

Awesome project but IMO still somehow hard to get a precise idea of how it all fits together without having to dig into the code.

I see that the project is moving so fast that it's probably hard to keep up with the documentation, hence why I waited so long to make this issue. I've been using llama_index for quite a few days and am still not aware of where the query_mode and response_mode are explained in details and I think it would be very useful.

Best I found was this: https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#setting-response-mode

Thanks!

Mar 23 '23 18:03 thiswillbeyourgithub

Hi @thiswillbeyourgithub

Query mode is actually under mode in the docs. Hers the page https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#setting-mode

Basically, for list and tree indexes, you can specify to use embeddings or not

I haven't seen answer_mode before. Where did you see that?

Mar 23 '23 23:03 logan-markewich

Hi!

I'm sorry I meant response_mode instead of answer_mode. Correcting this.

I saw the page you linked multiple times and still think it is not informative enough. For example I'm so far using Vectore Store index and not list index. Hence the following explanation :

For instance, you can choose to specify mode="default" or mode="embedding" for a list index. mode="default" will a create and refine an answer sequentially through the nodes of the list. mode="embedding" will synthesize an answer by fetching the top-k nodes by embedding similarity.

Isn't very helpful:

It beging with "for instance" which clearly indicates that it is not fully explained here
it seems to only refer to list index
I don't understand how the 'default' mode can be NOT using embeddings. Does it go through every node of the list and refine the answer? That does not look efficient for a large index and using embedding with a high top_k sounds more useful to me but then why would that not be the default? If I'm using a vectore store index then does that go through all the chunks of my index? Doesn't seem like it but it's what I understand from the text.
No error is thown is you specify a top_k value while using mode="default", this makes me think I clearly have no clue how this default mode work.

I have so far added several documents to vector store indexes (e.g. 10 documents in 1 index related to computer science, 10 documents for 1 index related to medical stuff, etc). Should have I done otherwise? My understanding is that vectore store are general purpose and store each chunk of text as a node with a specific embedding.

Regarding this documentation page : https://gpt-index.readthedocs.io/en/latest/reference/query.html

I see all the QueryMode listed and what they can be used for but no explanation of how any of them work. This makes it tricky to understand exactly what I could do with each for my projects.

Regarding the response_mode I have the same remarks, for example tree_summarize I think should be given an example.

I am absolutely amazed by this project and am certain you are building the future but think this is worth mentionning those things that might be obvious to power-power-users but not to power-uers that would love to more understand how it all works.

Thanks a lot and again, I'm really trying to be helpful and not sound bad :)

Mar 24 '23 12:03 thiswillbeyourgithub

There are technically more "options" here - but they aren't fully useful/customizable. I agree there's room to improve the UX here
List and Tree index are the only two indexes where setting mode is useful (AFAIK).
List index by default checks every node. This is best for creating summaries, but also for when you want to make sure the LLM doesn't miss any context. For a vector index, it of course uses embeddings to fetch only the relevant context. By defualt, there is a similarity_top_k option in the query that is set to one. The list index can kind of pull double-duty, by specifying mode="embedding" it will act as a vector index, and calculate the embeddings at query time. Whereas in a vector index, embeddings are calculated during index construction.
On a list index, you can pass similarity_top_k when using mode="default", but it will only be used if mode="embedding".

Regarding the vector indexes you have made, it sounds like you've done it properly. When having several indexes, you can combine them into a composable index: https://gpt-index.readthedocs.io/en/latest/how_to/composability.html https://github.com/jerryjliu/llama_index/blob/main/examples/composable_indices/ComposableIndices.ipynb

Regarding the QueryMode, again, this is the same as mode=... when calling query. In general, the only time you adjust this is to use embeddings in a list or tree index. I realize there are a lot of modes listed here, but I refer to point 1 above.

tree_summarize is used when building summaries, typically with a list or tree index, or with a vector index when the top_k is greater than 1

The examples folder in the repo is also a great source of information for all of these settings and where they are used. The examples there helped me out a ton.

These are all good points, and I totally encourage you to make a PR if there are specific changes you want to make and we can move the conversation there. Otherwise, we will of course continue trying to improve the docs and overall UX,

Also, the discord is a great place to get help as well! https://discord.gg/2DptqsPH

Mar 24 '23 18:03 logan-markewich

can any one please help doing sumarisation using tree summarise

May 11 '23 10:05 priyanka-rajeev

Closing this issue for now!

Also for reference @priyanka-rajeev tree summarize can be used like this (best results are with a list index, but you can also use a vector index if you don't need to read the entire document, you just might have to set the top k higher)

index = GPTListIndex.from_documents(docs)
response = index.as_query_engine(response_mode="tree_summarize").query("What is the summary of this document?")

index = GPTVectorStoreIndex.from_documents(docs)
response = index.as_query_engine(response_mode="tree_summarize", similarity_top_k=5).query("What is the summary of this document?")

Jun 06 '23 02:06 logan-markewich

llama_index
llama_index copied to clipboard

docs: detailed explanations about query_mode and response_mode are lacking

llama_index llama_index copied to clipboard

docs: detailed explanations about query_mode and response_mode are lacking

llama_index
llama_index copied to clipboard