langchain
langchain copied to clipboard
DOC: Bug in loading Chroma from disk (vectorstores/integrations/chroma)
Issue with current documentation:
https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/chroma.html#basic-example-including-saving-to-disk
Environment
- macOS
- Python 3.10.9
- langchain 0.0.228
- chromadb 0.3.26
Use https://github.com/hwchase17/langchain/blob/v0.0.228/docs/extras/modules/state_of_the_union.txt
Procedure
- Run the following Python script ref: https://github.com/hwchase17/langchain/blob/v0.0.228/docs/extras/modules/data_connection/vectorstores/integrations/chroma.ipynb
# import
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
# load the document and split it into chunks
loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
# split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
# load it into Chroma
db = Chroma.from_documents(docs, embedding_function)
# query it
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
# print results
print(docs[0].page_content)
# save to disk
db2 = Chroma.from_documents(docs, embedding_function, persist_directory="./chroma_db")
db2.persist()
-docs = db.similarity_search(query)
+docs = db2.similarity_search(query)
# load from disk
db3 = Chroma(persist_directory="./chroma_db")
-docs = db.similarity_search(query)
+docs = db3.similarity_search(query) # ValueError raised
print(docs[0].page_content)
Expected behavior
print(docs[0].page_content) with db3
Actual behavior
ValueError: You must provide embeddings or a function to compute them
Traceback (most recent call last):
File "/.../issue_report.py", line 35, in <module>
docs = db3.similarity_search(query)
File "/.../venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 174, in similarity_search
docs_and_scores = self.similarity_search_with_score(query, k, filter=filter)
File "/.../venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 242, in similarity_search_with_score
results = self.__query_collection(
File "/.../venv/lib/python3.10/site-packages/langchain/utils.py", line 55, in wrapper
return func(*args, **kwargs)
File "/.../venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 121, in __query_collection
return self._collection.query(
File "/.../venv/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 209, in query
raise ValueError(
ValueError: You must provide embeddings or a function to compute them
Idea or request for content:
Fixed by specifying the embedding_function parameter.
-db3 = Chroma(persist_directory="./chroma_db")
+db3 = Chroma(persist_directory="./chroma_db", embedding_function=embedding_function)
docs = db3.similarity_search(query)
print(docs[0].page_content)
(Added) ref: https://github.com/hwchase17/langchain/blob/v0.0.228/langchain/vectorstores/chroma.py#L62
Answer generated by a 🤖
Answer
Thank you for bringing this issue to our attention and providing a solution! Your proposed fix looks great.
We encourage you to contribute to LangChain by creating a pull request with your fix. This will help improve the framework for all users. If you need any assistance with the contribution process, feel free to ask. We appreciate your contribution!
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Thanks @baskaryan