llama_index
llama_index copied to clipboard
Updating an Index.
Does anyone have any working sample code of updating an index with new text.
I tried the code on the documentation and it doesnt work. Gives me errors I cant resolve.
Ideally the code should be able to append new text to the original index.
hi @dranastos could you paste the code / stack trace?
import os import io import sys
from gpt_index import GPTListIndex, SimpleDirectoryReader from IPython.display import Markdown, display
def main( directory, textupdate ):
filedirdata = directory + "/data/"
fileindex = directory + '/index.json'
fileindex2 = directory + '/index2.json'
index = GPTListIndex([])
doc_chunks = [fileindex,fileindex2]
for i, text in enumerate(text_chunks):
doc = Document(text, doc_id=f"doc_id_{i}")
doc_chunks.append(doc)
for doc_chunk in doc_chunks:
index.insert(doc_chunk)
#documents = SimpleDirectoryReader(filedirdata).load_data()
#index = GPTSimpleVectorIndex(documents)
#index2 = GPTSimpleVectorIndex.load_from_disk( directory + "/index2.json" )
#index.update(index2)
index.save_to_disk(fileindex)
if name == "main": directory = sys.argv[1] textupdate = sys.argv[2] main(directory, textupdate)
[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens Traceback (most recent call last): File "C:\ai3\gpt_index\examples\update.py", line 45, in
main(directory, textupdate) File "C:\ai3\gpt_index\examples\update.py", line 22, in main for i, text in enumerate(text_chunks): NameError: name 'text_chunks' is not defined. Did you mean: 'doc_chunks'?
it looks like you need to replace text_chunks
with doc_chunks
which is the variable you defined
i get this now
[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens Traceback (most recent call last): File "C:\ai3\gpt_index\examples\update.py", line 45, in
main(directory, textupdate) File "C:\ai3\gpt_index\examples\update.py", line 23, in main doc = Document(text, doc_id=f"doc_id_{i}") NameError: name 'Document' is not defined
from gpt_index import Document
are you following a section in the documentation? i can update this to be more clear
yes
here
https://gpt-index.readthedocs.io/en/latest/how_to/update.html
Would help if there was exact working code in that section for update.
this is not understood "embed_model = OpenAIEmbedding()" with the code thats there
i got it to run without any errors now..
But its hanging at this
[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens
been like that for a few minutes
import os import io import sys
from gpt_index import GPTListIndex, SimpleDirectoryReader , Document from IPython.display import Markdown, display from gpt_index.embeddings.openai import OpenAIEmbedding
def main( directory, textupdate ):
filedirdata = directory + "/data/"
fileindex = directory + '/index.json'
fileindex2 = directory + '/index2.json'
index = GPTListIndex([])
doc_chunks = [fileindex,fileindex2]
for i, text in enumerate(doc_chunks):
doc = Document(text, doc_id=f"doc_id_{i}")
doc_chunks.append(doc)
for doc_chunk in doc_chunks:
index.insert(doc_chunk)
#documents = SimpleDirectoryReader(filedirdata).load_data()
#index = GPTSimpleVectorIndex(documents)
#index2 = GPTSimpleVectorIndex.load_from_disk( directory + "/index2.json" )
#index.update(index2)
index.save_to_disk(fileindex)
if name == "main": directory = sys.argv[1] textupdate = sys.argv[2] main(directory, textupdate)
thats the updated code ,
i want to basically merge the information of two indexes
a base index an then the updated information to that index
@jerryjliu ??
@dranastos we support this through composability. You should try something like the following:
index1 = <whatever your first index is>
index2 = <whatever your second index is>
index = GPTListIndex([index1, index2])
When you query the list index, we will first query the subindices, and use the list index to combine the answers from each subindex.
Also re:
[build_index_from_documents] Total LLM token usage: 0 tokens [build_index_from_documents] Total embedding token usage: 0 tokens
are you sure it's hanging? the list index is supposed to output this when first built (it doesn't call any LLM or embedding api's during index construction)
Going to close this since it's not an issue. Please join the discord community (https://discord.gg/dGcwcsnxhU) for better support!