langchain DOC: Bug in loading Chroma from disk (vectorstores/integrations/chroma)

trafficstars

Issue with current documentation:

https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/chroma.html#basic-example-including-saving-to-disk

Environment

macOS
Python 3.10.9
langchain 0.0.228
chromadb 0.3.26

Use https://github.com/hwchase17/langchain/blob/v0.0.228/docs/extras/modules/state_of_the_union.txt

Procedure

Run the following Python script ref: https://github.com/hwchase17/langchain/blob/v0.0.228/docs/extras/modules/data_connection/vectorstores/integrations/chroma.ipynb

# import
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader

# load the document and split it into chunks
loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()

# split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# load it into Chroma
db = Chroma.from_documents(docs, embedding_function)

# query it
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

# print results
print(docs[0].page_content)

# save to disk
db2 = Chroma.from_documents(docs, embedding_function, persist_directory="./chroma_db")
db2.persist()
-docs = db.similarity_search(query)
+docs = db2.similarity_search(query)

# load from disk
db3 = Chroma(persist_directory="./chroma_db")
-docs = db.similarity_search(query)
+docs = db3.similarity_search(query)  # ValueError raised
print(docs[0].page_content)

Expected behavior

print(docs[0].page_content) with db3

Actual behavior

ValueError: You must provide embeddings or a function to compute them

Traceback (most recent call last):
  File "/.../issue_report.py", line 35, in <module>
    docs = db3.similarity_search(query)
  File "/.../venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 174, in similarity_search
    docs_and_scores = self.similarity_search_with_score(query, k, filter=filter)
  File "/.../venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 242, in similarity_search_with_score
    results = self.__query_collection(
  File "/.../venv/lib/python3.10/site-packages/langchain/utils.py", line 55, in wrapper
    return func(*args, **kwargs)
  File "/.../venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 121, in __query_collection
    return self._collection.query(
  File "/.../venv/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 209, in query
    raise ValueError(
ValueError: You must provide embeddings or a function to compute them

Idea or request for content:

Fixed by specifying the embedding_function parameter.

-db3 = Chroma(persist_directory="./chroma_db")
+db3 = Chroma(persist_directory="./chroma_db", embedding_function=embedding_function)
docs = db3.similarity_search(query)
print(docs[0].page_content)

(Added) ref: https://github.com/hwchase17/langchain/blob/v0.0.228/langchain/vectorstores/chroma.py#L62

Jul 09 '23 17:07 ftnext

Answer generated by a 🤖

Answer

Thank you for bringing this issue to our attention and providing a solution! Your proposed fix looks great.

We encourage you to contribute to LangChain by creating a pull request with your fix. This will help improve the framework for all users. If you need any assistance with the contribution process, feel free to ask. We appreciate your contribution!

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

Jul 09 '23 17:07 dosubot[bot]

Thanks @baskaryan

Jul 10 '23 11:07 ftnext

langchain langchain copied to clipboard

DOC: Bug in loading Chroma from disk (vectorstores/integrations/chroma)

Issue with current documentation:

Environment

Procedure

Expected behavior

Actual behavior

Idea or request for content:

Answer

langchain
langchain copied to clipboard