llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Updating an Index.

Open dranastos opened this issue 2 years ago • 15 comments

Does anyone have any working sample code of updating an index with new text.

I tried the code on the documentation and it doesnt work. Gives me errors I cant resolve.

Ideally the code should be able to append new text to the original index.

dranastos avatar Feb 07 '23 18:02 dranastos

hi @dranastos could you paste the code / stack trace?

jerryjliu avatar Feb 07 '23 21:02 jerryjliu

import os import io import sys

from gpt_index import GPTListIndex, SimpleDirectoryReader from IPython.display import Markdown, display

def main( directory, textupdate ):

filedirdata =  directory + "/data/"

fileindex = directory + '/index.json'
fileindex2 = directory + '/index2.json'

index = GPTListIndex([])


doc_chunks = [fileindex,fileindex2]
for i, text in enumerate(text_chunks):
    doc = Document(text, doc_id=f"doc_id_{i}")
    doc_chunks.append(doc)


for doc_chunk in doc_chunks:
    index.insert(doc_chunk)



#documents = SimpleDirectoryReader(filedirdata).load_data()
#index = GPTSimpleVectorIndex(documents)

#index2 = GPTSimpleVectorIndex.load_from_disk( directory + "/index2.json" )

#index.update(index2)
index.save_to_disk(fileindex)

if name == "main": directory = sys.argv[1] textupdate = sys.argv[2] main(directory, textupdate)

dranastos avatar Feb 07 '23 21:02 dranastos

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens Traceback (most recent call last): File "C:\ai3\gpt_index\examples\update.py", line 45, in main(directory, textupdate) File "C:\ai3\gpt_index\examples\update.py", line 22, in main for i, text in enumerate(text_chunks): NameError: name 'text_chunks' is not defined. Did you mean: 'doc_chunks'?

dranastos avatar Feb 07 '23 21:02 dranastos

it looks like you need to replace text_chunks with doc_chunks which is the variable you defined

jerryjliu avatar Feb 07 '23 21:02 jerryjliu

i get this now

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens Traceback (most recent call last): File "C:\ai3\gpt_index\examples\update.py", line 45, in main(directory, textupdate) File "C:\ai3\gpt_index\examples\update.py", line 23, in main doc = Document(text, doc_id=f"doc_id_{i}") NameError: name 'Document' is not defined

dranastos avatar Feb 07 '23 21:02 dranastos

from gpt_index import Document

are you following a section in the documentation? i can update this to be more clear

jerryjliu avatar Feb 07 '23 21:02 jerryjliu

yes

here

https://gpt-index.readthedocs.io/en/latest/how_to/update.html

dranastos avatar Feb 07 '23 21:02 dranastos

Would help if there was exact working code in that section for update.

this is not understood "embed_model = OpenAIEmbedding()" with the code thats there

dranastos avatar Feb 07 '23 21:02 dranastos

i got it to run without any errors now..

But its hanging at this

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens

been like that for a few minutes

dranastos avatar Feb 07 '23 21:02 dranastos

import os import io import sys

from gpt_index import GPTListIndex, SimpleDirectoryReader , Document from IPython.display import Markdown, display from gpt_index.embeddings.openai import OpenAIEmbedding

def main( directory, textupdate ):

filedirdata =  directory + "/data/"

fileindex = directory + '/index.json'
fileindex2 = directory + '/index2.json'

index = GPTListIndex([])


doc_chunks = [fileindex,fileindex2]
for i, text in enumerate(doc_chunks):
    doc = Document(text, doc_id=f"doc_id_{i}")
    doc_chunks.append(doc)


for doc_chunk in doc_chunks:
    index.insert(doc_chunk)



#documents = SimpleDirectoryReader(filedirdata).load_data()
#index = GPTSimpleVectorIndex(documents)

#index2 = GPTSimpleVectorIndex.load_from_disk( directory + "/index2.json" )

#index.update(index2)
index.save_to_disk(fileindex)

if name == "main": directory = sys.argv[1] textupdate = sys.argv[2] main(directory, textupdate)

dranastos avatar Feb 07 '23 21:02 dranastos

thats the updated code ,

dranastos avatar Feb 07 '23 21:02 dranastos

i want to basically merge the information of two indexes

dranastos avatar Feb 07 '23 21:02 dranastos

a base index an then the updated information to that index

dranastos avatar Feb 07 '23 21:02 dranastos

@jerryjliu ??

dranastos avatar Feb 08 '23 16:02 dranastos

@dranastos we support this through composability. You should try something like the following:

index1 = <whatever your first index is>
index2 = <whatever your second index is>

index = GPTListIndex([index1, index2])

When you query the list index, we will first query the subindices, and use the list index to combine the answers from each subindex.

Also re:

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens

are you sure it's hanging? the list index is supposed to output this when first built (it doesn't call any LLM or embedding api's during index construction)

jerryjliu avatar Feb 13 '23 04:02 jerryjliu

Going to close this since it's not an issue. Please join the discord community (https://discord.gg/dGcwcsnxhU) for better support!

Disiok avatar Mar 16 '23 17:03 Disiok