llama_index Updating an Index.

Does anyone have any working sample code of updating an index with new text.

I tried the code on the documentation and it doesnt work. Gives me errors I cant resolve.

Ideally the code should be able to append new text to the original index.

Feb 07 '23 18:02 dranastos

hi @dranastos could you paste the code / stack trace?

Feb 07 '23 21:02 jerryjliu

import os import io import sys

from gpt_index import GPTListIndex, SimpleDirectoryReader from IPython.display import Markdown, display

def main( directory, textupdate ):

filedirdata =  directory + "/data/"

fileindex = directory + '/index.json'
fileindex2 = directory + '/index2.json'

index = GPTListIndex([])


doc_chunks = [fileindex,fileindex2]
for i, text in enumerate(text_chunks):
    doc = Document(text, doc_id=f"doc_id_{i}")
    doc_chunks.append(doc)


for doc_chunk in doc_chunks:
    index.insert(doc_chunk)



#documents = SimpleDirectoryReader(filedirdata).load_data()
#index = GPTSimpleVectorIndex(documents)

#index2 = GPTSimpleVectorIndex.load_from_disk( directory + "/index2.json" )

#index.update(index2)
index.save_to_disk(fileindex)

if name == "main": directory = sys.argv[1] textupdate = sys.argv[2] main(directory, textupdate)

Feb 07 '23 21:02 dranastos

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens Traceback (most recent call last): File "C:\ai3\gpt_index\examples\update.py", line 45, in main(directory, textupdate) File "C:\ai3\gpt_index\examples\update.py", line 22, in main for i, text in enumerate(text_chunks): NameError: name 'text_chunks' is not defined. Did you mean: 'doc_chunks'?

Feb 07 '23 21:02 dranastos

it looks like you need to replace text_chunks with doc_chunks which is the variable you defined

Feb 07 '23 21:02 jerryjliu

i get this now

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens Traceback (most recent call last): File "C:\ai3\gpt_index\examples\update.py", line 45, in main(directory, textupdate) File "C:\ai3\gpt_index\examples\update.py", line 23, in main doc = Document(text, doc_id=f"doc_id_{i}") NameError: name 'Document' is not defined

Feb 07 '23 21:02 dranastos

from gpt_index import Document

are you following a section in the documentation? i can update this to be more clear

Feb 07 '23 21:02 jerryjliu

yes

here

https://gpt-index.readthedocs.io/en/latest/how_to/update.html

Feb 07 '23 21:02 dranastos

Would help if there was exact working code in that section for update.

this is not understood "embed_model = OpenAIEmbedding()" with the code thats there

Feb 07 '23 21:02 dranastos

i got it to run without any errors now..

But its hanging at this

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens

been like that for a few minutes

Feb 07 '23 21:02 dranastos

import os import io import sys

from gpt_index import GPTListIndex, SimpleDirectoryReader , Document from IPython.display import Markdown, display from gpt_index.embeddings.openai import OpenAIEmbedding

def main( directory, textupdate ):

filedirdata =  directory + "/data/"

fileindex = directory + '/index.json'
fileindex2 = directory + '/index2.json'

index = GPTListIndex([])


doc_chunks = [fileindex,fileindex2]
for i, text in enumerate(doc_chunks):
    doc = Document(text, doc_id=f"doc_id_{i}")
    doc_chunks.append(doc)


for doc_chunk in doc_chunks:
    index.insert(doc_chunk)



#documents = SimpleDirectoryReader(filedirdata).load_data()
#index = GPTSimpleVectorIndex(documents)

#index2 = GPTSimpleVectorIndex.load_from_disk( directory + "/index2.json" )

#index.update(index2)
index.save_to_disk(fileindex)

if name == "main": directory = sys.argv[1] textupdate = sys.argv[2] main(directory, textupdate)

Feb 07 '23 21:02 dranastos

thats the updated code ,

Feb 07 '23 21:02 dranastos

i want to basically merge the information of two indexes

Feb 07 '23 21:02 dranastos

a base index an then the updated information to that index

Feb 07 '23 21:02 dranastos

@jerryjliu ??

Feb 08 '23 16:02 dranastos

@dranastos we support this through composability. You should try something like the following:

index1 = <whatever your first index is>
index2 = <whatever your second index is>

index = GPTListIndex([index1, index2])

When you query the list index, we will first query the subindices, and use the list index to combine the answers from each subindex.

Also re:

[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens

are you sure it's hanging? the list index is supposed to output this when first built (it doesn't call any LLM or embedding api's during index construction)

Feb 13 '23 04:02 jerryjliu

Going to close this since it's not an issue. Please join the discord community (https://discord.gg/dGcwcsnxhU) for better support!

Mar 16 '23 17:03 Disiok

llama_index llama_index copied to clipboard

Updating an Index.

llama_index
llama_index copied to clipboard