llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Inserting document with same `doc_id`

Open TmLev opened this issue 2 years ago • 5 comments

How would GPTSimpleVectorIndex react if I were to insert a document with doc_id of an already present document?

index = GPTSimpleVectorIndex()

document = Document(..., doc_id="id")
index.insert(document)

updated_document = Document(..., doc_id="id")

#     👇 What will happen here?
index.insert(updated_document)

Will it just update the document/nodes/vectors? Is it even safe to do so?

TmLev avatar Feb 21 '23 17:02 TmLev

Also related: is there a way to check whether a document with some doc_id is present in the index? I presume it's the role of GPTSimpleVectorIndex().docstore.document_exists(doc_id)?

TmLev avatar Feb 21 '23 18:02 TmLev

It seems like GPTSimpleVectorIndex ignores the user-specified Document().doc_id, because the docstore.docs has only uuid V4 ids and no user-provided ones. Although, nodes do have ref_doc_id set to user's doc_id. Is there a reason for such behaviour? I would very much like to query documents by ids that I provide, and not the ones llama_index assigns. Or at least query by ref_doc_id.

TmLev avatar Feb 22 '23 17:02 TmLev

Hi @TmLev, we do currently have an update function - which deletes the doc then inserts. Is that what you'd be looking for?

An alternative UX we're thinking is to just make our insert function an upsert instead

jerryjliu avatar Feb 27 '23 06:02 jerryjliu

Hi @TmLev, we do currently have an update function - which deletes the doc then inserts. Is that what you'd be looking for?

My original question is "what would happen if I were to insert a document with an ID of an already present document?"

TmLev avatar Feb 27 '23 11:02 TmLev

My original question is "what would happen if I were to insert a document with an ID of an already present document?"

Yeah i guess my point is that for those, the insert call should error, and you should really just be using the update function instead.

jerryjliu avatar Feb 28 '23 08:02 jerryjliu

@TmLev heads up, going to close this issue for now unless you had additional issues to raise (feel free to reopen)

jerryjliu avatar Mar 06 '23 21:03 jerryjliu