private-gpt Ingest new documents by updating old vector store instead of starting from scratch.

Can you add (ingest) new documents without having to create vector database from start. I mean will the vector database only be updated with new indexes or will it have to be create from start?

Example: ingest "state of the union 2023.txt" file and once done, add "flash attention paper.pdf". Will this change cause to restart computing vectors for all documents or will the vector store be only updated with new vectors and index?

May 16 '23 06:05 superchargez

See this method of Chroma:

def add_documents(
    documents: List[Document],
    **kwargs: Any
) -> List[str]
Run more documents through the embeddings and add to the vectorstore.

Args:
    documents (List[Document]: Documents to add to the vectorstore.

Returns:
    List[str]: List of IDs of the added texts.

May 16 '23 06:05 maozdemir

https://github.com/imartinez/privateGPT/pull/201

May 16 '23 09:05 maozdemir