private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Ingest new documents by updating old vector store instead of starting from scratch.

Open superchargez opened this issue 2 years ago • 2 comments

Can you add (ingest) new documents without having to create vector database from start. I mean will the vector database only be updated with new indexes or will it have to be create from start?

Example: ingest "state of the union 2023.txt" file and once done, add "flash attention paper.pdf". Will this change cause to restart computing vectors for all documents or will the vector store be only updated with new vectors and index?

superchargez avatar May 16 '23 06:05 superchargez

See this method of Chroma:

def add_documents(
    documents: List[Document],
    **kwargs: Any
) -> List[str]
Run more documents through the embeddings and add to the vectorstore.

Args:
    documents (List[Document]: Documents to add to the vectorstore.

Returns:
    List[str]: List of IDs of the added texts.

maozdemir avatar May 16 '23 06:05 maozdemir

https://github.com/imartinez/privateGPT/pull/201

maozdemir avatar May 16 '23 09:05 maozdemir