llama_index
llama_index copied to clipboard
How to concat GPTSimpleVectorIndex?
Hi! I want to combine two GPTSimpleVectorIndex to one GPTSimpleVectorIndex.
List index is one way to do, but ListIndex queries each GPTSimpleVectorIndex.
I want to call one query over two GPTSimpleVectorIndex, so i want to combine two GPTSimpleVectorIndex to one two GPTSimpleVectorIndex. Is there any procedures?
Thanks!
Hey @kenoharada, we don't have great support for this yet.
Is there reason that prevent you from building a new GPTSimpleVectorIndex over both set of documents you have? (Obviously this would be slightly costlier since you have to re-embed the documents).
It's because connecters such as pptx or docx only support to build index one by one file?
@Disiok is it not possible to combine these indexes using a tree index? Something like:
from llama_index import GPTTreeIndex, GPTSimpleVectorIndex
# Create subindices for books and lectures
books_index = GPTSimpleVectorIndex(books_documents)
lectures_index = GPTSimpleVectorIndex(lectures_documents)
# Set summary text for each subindex
books_index.set_text("summary_books")
lectures_index.set_text("summary_lectures")
# Create a tree index for routing
combined_index = GPTTreeIndex([books_index, lectures_index])
# Query the combined index
response = combined_index.query(
"Your query here",
mode="recursive",
query_configs=...
)
It's because connecters such as pptx or docx only support to build index one by one file?
connectors such as .pptx or .docx just return Document objects ,and an index takes in a list of Document objects. so you can ingest these documents into one index
Makes sense. But I am having a similar issue where my users are uploading documents and when they want to query multiple documents at once, I need to construct a ListIndex and send the query to it. However, ListIndex seems to not work as a concat
but rather it is returning the most relevant index in its source nodes (but I am looking for most relevant chunks in the sub indices). Any ideas @jerryjliu?
@Disiok Thank you for the reply.
Is there reason that prevent you from building a new GPTSimpleVectorIndex over both set of documents you have? (Obviously this would be slightly costlier since you have to re-embed the documents).
I was hoping to find a more cost-effective solution for this issue. It's good to know that, as of now, there doesn't seem to be another method available and building a new GPTSimpleVectorIndex over both sets of documents is the current approach.
Thank you very much!
Also relevant in this issue so copy pasting :
Surprisingly, the exact feature we seem to be looking for is mentionned here in langchain : https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html
Hi, @kenoharada! I'm here to help the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you are looking for a way to combine two GPTSimpleVectorIndex into one, so that you can call a single query over both indexes instead of querying each index separately. @Disiok suggests building a new GPTSimpleVectorIndex over both sets of documents, but you are hoping for a more cost-effective solution. @ctle-vn suggests using a tree index to combine the indexes, and @jerryjliu mentions that connectors like .pptx or .docx can be ingested into one index.
Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and we look forward to your response!