chroma icon indicating copy to clipboard operation
chroma copied to clipboard

How do I merge different vectorstores ?

Open mangled-data opened this issue 2 years ago • 5 comments

I am loading mini batches like vectorstores = [Chroma(persist_directory=x, embedding_function=embedding) for x in dirs]

How can I merge ?

mangled-data avatar Mar 24 '23 15:03 mangled-data

@mangled-data so you have many chroma directories? can you explain the use case some more? This is an uncommon approach. Because the indexes are separate - you would need to manually merge them right now.

jeffchuber avatar Mar 29 '23 20:03 jeffchuber

@jeffchuber I create vectors in mini-batches and cache them (so when we interrupted, I can restart from left off point). What I do with FAISS is roughly below

for idx, doc_arr in enumerate(mini_batches):
                if no_cache:
                    if self.method == 'FAISS':
                        db = FAISS.from_documents(doc_arr, embeddings)
                        db.save_local(persist_directory)
                        vectordb_arr.append(db)
                    else:
                        vectordb = Chroma.from_documents(documents=doc_arr, embedding=embedding, persist_directory=persist_directory)
                        vectordb.persist()
                else:
                    print(f"Found cached embeddings for {uniq_tag}")
                    vectordb_arr.append(vectordb)

            db = vectordb_arr[0]
            for vectordb in vectordb_arr[1:]:
                db.merge_from(vectordb)
                
How do I manually merge them ? Sorry if I missed the documentation.                

mangled-data avatar Mar 29 '23 20:03 mangled-data

@mangled-data just confirming, this is using langchain?

jeffchuber avatar Mar 29 '23 20:03 jeffchuber

yes, that is correct.

mangled-data avatar Mar 29 '23 21:03 mangled-data

@mangled-data try calling persist directly https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py#L189 - otherwise it won't call persist until the object is deleted from memory. calling .persist() will force it. and then when you load it again, it will boot up with the saved information.

let me know if this works for you!

jeffchuber avatar Mar 29 '23 22:03 jeffchuber

Closing this as stale but feel free to reopen @mangled-data if there is still an issue!

HammadB avatar May 11 '23 23:05 HammadB