langchain
langchain copied to clipboard
ChromaDB error when using HuggingFace Embeddings
The following error appears at the end of the script
TypeError: 'NoneType' object is not callable
Exception ignored in: <function PersistentDuckDB.__del__ at 0x7f53e574d4c0>
Traceback (most recent call last):
File ".../.local/lib/python3.9/site-packages/chromadb/db/duckdb.py", line 445, in __del__
AttributeError: 'NoneType' object has no attribute 'info'
... and comes up when doing:
embedding = HuggingFaceEmbeddings(model_name="hiiamsid/sentence_similarity_spanish_es")
docsearch = Chroma.from_documents(texts, embedding,persist_directory=persist_directory)
but doesn't happen with:
embedding = LlamaCppEmbeddings(model_path=path)
I suspect that we have encountered a bug, but fortunately, we have found a workaround to mitigate potential errors with ChromaDB.
https://github.com/hwchase17/langchain/issues/2491#issuecomment-1499274206
Worked beautifully.
The source of the bug is that the del method https://github.com/chroma-core/chroma/blob/main/chromadb/db/duckdb.py#L444 is gettting called after other resources such as logger and os have already been deleted. You can call call chroma.persist() before exiting and your data will still be saved, but I don't see any easy way to fix the bug itself.
Hello community, has this issue been resolved? or what's the workaround?
@zhenghax
Hello community, has this issue been resolved? or what's the workaround?
I believe this has been fixed: https://github.com/chroma-core/chroma/issues/364
I used "BAAI/bge-base-en" embedding and created succesfully a Chroma Database.
# Supplying a persist_directory will store the embeddings on disk
persist_directory = '/content/drive/MyDrive/db'
## Here is the new embeddings being used
embedding = model_norm # "BAAI/bge-base-en"
# load a vector database from persist direvtory, pay attention to the parameter: embedding_function
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
I try to use "collection" class:
collection = client.get_collection(name='langchain', embedding_function=embedding)
collection.count() # 467
But I am not successfull to add a record in the collection by using the code:
document="""
About the author
Arthur C. Brooks is an American social scientist, the William Henry
Bloomberg Professor of the Practice of Public Leadership at the
Harvard Kennedy School, and Professor of Management Practice at
the Harvard Business School. Prior, he was the president of the
American Enterprise Institute for ten years, where he held the Beth
and Ravenel Curry Chair in Free Enterprise. He has authored eleven
books, including the bestsellers Love Your Enemies and The
Conservative Heart, and writes the popular How to Build a Life
column at The Atlantic. He is also the host of the podcast The Art of
Happiness with Arthur Brooks.
"""
collection.add(
documents=[document],
metadatas=[{"page": 1, "source": "/content/drive/MyDrive/book/about_the_author.pdf"}],
ids=["467"]
)
and an error:
TypeError Traceback (most recent call last)
in <cell line: 17>() 15 """ 16 ---> 17 collection.add( 18 documents=[document], 19 metadatas=[{"page": 1, "source": "/content/drive/MyDrive/book/about_the_author.pdf"}], 1 frames /usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py in _validate_embedding_set(self, ids, embeddings, metadatas, documents, require_embeddings_or_documents) 380 "You must provide embeddings or a function to compute them" 381 ) --> 382 embeddings = self._embedding_function(documents) 383 384 # if embeddings is None:
TypeError: 'HuggingFaceBgeEmbeddings' object is not callable
Hi, @juanps90! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, the original issue was about a TypeError occurring when using HuggingFace Embeddings with ChromaDB. It seems that a workaround has been found to mitigate potential errors with ChromaDB, and a fix has been implemented. However, a new issue has been reported where a TypeError occurs when trying to add a record to a collection using the HuggingFaceBgeEmbeddings object.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository!
I used "BAAI/bge-base-en" embedding and created succesfully a Chroma Database.
# Supplying a persist_directory will store the embeddings on disk persist_directory = '/content/drive/MyDrive/db' ## Here is the new embeddings being used embedding = model_norm # "BAAI/bge-base-en" # load a vector database from persist direvtory, pay attention to the parameter: embedding_function vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)I try to use "collection" class:
collection = client.get_collection(name='langchain', embedding_function=embedding) collection.count() # 467But I am not successfull to add a record in the collection by using the code:
document=""" About the author Arthur C. Brooks is an American social scientist, the William Henry Bloomberg Professor of the Practice of Public Leadership at the Harvard Kennedy School, and Professor of Management Practice at the Harvard Business School. Prior, he was the president of the American Enterprise Institute for ten years, where he held the Beth and Ravenel Curry Chair in Free Enterprise. He has authored eleven books, including the bestsellers Love Your Enemies and The Conservative Heart, and writes the popular How to Build a Life column at The Atlantic. He is also the host of the podcast The Art of Happiness with Arthur Brooks. """ collection.add( documents=[document], metadatas=[{"page": 1, "source": "/content/drive/MyDrive/book/about_the_author.pdf"}], ids=["467"] )and an error:
TypeError Traceback (most recent call last) in <cell line: 17>() 15 """ 16 ---> 17 collection.add( 18 documents=[document], 19 metadatas=[{"page": 1, "source": "/content/drive/MyDrive/book/about_the_author.pdf"}], 1 frames /usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py in _validate_embedding_set(self, ids, embeddings, metadatas, documents, require_embeddings_or_documents) 380 "You must provide embeddings or a function to compute them" 381 ) --> 382 embeddings = self._embedding_function(documents) 383 384 # if embeddings is None: TypeError: 'HuggingFaceBgeEmbeddings' object is not callable
Is there a solution for this? From reading their documentation, it seems you need an API key to use HuggingFaceEmbeddings with Chroma, but not when using LangChain's version of Chroma.
Ideally, I'd like to use open source embeddings models from HuggingFace without paying.
I used "BAAI/bge-base-en" embedding and created succesfully a Chroma Database.
# Supplying a persist_directory will store the embeddings on disk persist_directory = '/content/drive/MyDrive/db' ## Here is the new embeddings being used embedding = model_norm # "BAAI/bge-base-en" # load a vector database from persist direvtory, pay attention to the parameter: embedding_function vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)I try to use "collection" class:
collection = client.get_collection(name='langchain', embedding_function=embedding) collection.count() # 467But I am not successfull to add a record in the collection by using the code:
document=""" About the author Arthur C. Brooks is an American social scientist, the William Henry Bloomberg Professor of the Practice of Public Leadership at the Harvard Kennedy School, and Professor of Management Practice at the Harvard Business School. Prior, he was the president of the American Enterprise Institute for ten years, where he held the Beth and Ravenel Curry Chair in Free Enterprise. He has authored eleven books, including the bestsellers Love Your Enemies and The Conservative Heart, and writes the popular How to Build a Life column at The Atlantic. He is also the host of the podcast The Art of Happiness with Arthur Brooks. """ collection.add( documents=[document], metadatas=[{"page": 1, "source": "/content/drive/MyDrive/book/about_the_author.pdf"}], ids=["467"] )and an error:
TypeError Traceback (most recent call last) in <cell line: 17>() 15 """ 16 ---> 17 collection.add( 18 documents=[document], 19 metadatas=[{"page": 1, "source": "/content/drive/MyDrive/book/about_the_author.pdf"}], 1 frames /usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py in _validate_embedding_set(self, ids, embeddings, metadatas, documents, require_embeddings_or_documents) 380 "You must provide embeddings or a function to compute them" 381 ) --> 382 embeddings = self._embedding_function(documents) 383 384 # if embeddings is None: TypeError: 'HuggingFaceBgeEmbeddings' object is not callable
Is there a solution for this? From reading their documentation, it seems you need an API key to use HuggingFaceEmbeddings with Chroma, but not when using LangChain's version of Chroma.
Ideally, I'd like to use open source embeddings models from HuggingFace without paying.
did you find a good solution? Chroma.py only accepts Huggingface embeddings but I would rather use open source embeddings as well