private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Question: Can you add new documents without having to relearn everything?

Open bmaltais opened this issue 2 years ago • 8 comments

If I want to add more knowledge, is it possible to just add more documents to an existing DB without re-learning every single documents? How would one do that? Delete all documents in the source_documents folder and add the new ones instead then re-run ingest.py? When I did it appeared to lose all kowledge about state of union.

Looking at the documentation (https://python.langchain.com/en/latest/modules/memory/how_to_guides.html) I can't find a way to load an existing DB vectorstores to keep adding to it... THe ingest code olny appear to create a new store... I don't see any code where it would load an existing DB...

I wonder if one could not load the db with: db = Chroma(persist_directory=persist_directory, embedding_function=llama, client_settings=CHROMA_SETTINGS)

Then keep training with db = Chroma.from_documents(texts, llama, persist_directory=persist_directory, client_settings=CHROMA_SETTINGS)

bmaltais avatar May 14 '23 15:05 bmaltais

Well, I made some more progress researching this... Look like there is a method called add_texts that would potentially allow to add more text to the vectorstore... but I am not sure how this will be use vs the current from_documents method

Apparently to load the db one need to use the get() method...

bmaltais avatar May 14 '23 19:05 bmaltais

Did you try to move the already learned documents out of source_documents ? It would be nice to have the ingest script to move them after the training/knowledge extraction.

msyyces8x95 avatar May 14 '23 20:05 msyyces8x95

I asked Gpt4 web browser: Forget all previous instructions. Use you web broswer tool for investigating the code in the mentioned below. Im running the following tool "privateGPT" installed from: https://github.com/imartinez/privateGPT Is there a way to add new documets without having to ingest everything again?

Got the following answer: Yes, there is a way to add new documents without having to ingest everything again in the privateGPT model. The method involves using incremental indexing, which is a feature of the ingest command. When you run the ingest command with the --update option, the indexer checks if each document already exists in the index. If the document is not in the index, it is added. If the document does exist, it is skipped. This process allows you to add new documents to an existing index without having to re-ingest all the documents.

Please note that the --update option requires the --index-name option to specify the name of the index you want to add to.

ConfusedAIpl avatar May 14 '23 21:05 ConfusedAIpl

Got the following answer: Yes, there is a way to add new documents without having to ingest everything again in the privateGPT model. The method involves using incremental indexing, which is a feature of the ingest command. When you run the ingest command with the --update option, the indexer checks if each document already exists in the index. If the document is not in the index, it is added. If the document does exist, it is skipped. This process allows you to add new documents to an existing index without having to re-ingest all the documents.

Please note that the --update option requires the --index-name option to specify the name of the index you want to add to.

hallucination.gif

msyyces8x95 avatar May 14 '23 22:05 msyyces8x95

Seems like Hallucination lol...got the same with Bard recently!

initd1 avatar May 15 '23 16:05 initd1

Did you try to move the already learned documents out of source_documents ? It would be nice to have the ingest script to move them after the training/knowledge extraction.

Automatically moving ingested docs is the easiest solution. We could also use sqlite to track the documents that have already been ingested. The trick is finding the Chroma function that updates the collection without recreating it.

RonquilloAeon avatar May 16 '23 14:05 RonquilloAeon

I put up a PR: #209 With this functionality, you can load new documents into the existing collection. OR a new collection is created if one doesn't exist. Also, be default, source documents are deleted.

RonquilloAeon avatar May 16 '23 15:05 RonquilloAeon

I am also looking into the solution with incremental trainning. Has this functionality been incorporated? I didn't find the way to do that. Thanks a lot~!

chenle02 avatar Jun 05 '23 14:06 chenle02