langchain embedding function not passed properly to Chroma

Hi, I'm running official docker image from Chroma and using it via rest API (I need it in server mode for persistent storage in production deployment)

When inserting documents (I'm loading pdfs) I'm getting

chromadb.api.models.Collection No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
even though I'm passing OpenAIEmbeddings() as embedding parameter

embeddings = OpenAIEmbeddings()
    chroma_settings = Settings(
        chroma_api_impl="rest",
        chroma_server_host="localhost",
        chroma_server_http_port=8000,
        anonymized_telemetry=False,
    )

    
    loader = PyPDFLoader(pdf_url)
    pages = loader.load_and_split()
    Chroma.from_documents(
        documents=pages, embedding=embeddings, client_settings=chroma_settings
    )

Apr 16 '23 15:04 meal

in your definition to OpenAIEmbeddings, you need to specify an embeddings model name model=your_embedding_deployment_name

Apr 17 '23 01:04 skeretna

in your definition to OpenAIEmbeddings, you need to specify an embeddings model name model=your_embedding_deployment_name

that's not the case, I've tried that as well

Apr 17 '23 06:04 meal

I am experience the same issue, I tried specifying the embedding function on the chroma client as well, but still the same issue

May 02 '23 16:05 baseplate77

I'm seeing the same issue.

May 13 '23 03:05 akeybl

I have the same issue. The embedding function is defined and was running fine before I dockerized chromadb.

May 14 '23 10:05 mlorenzon

chromadb has an issue where it's list_collections logs this error while it shouldn't. https://github.com/chroma-core/chroma/issues/484

May 15 '23 01:05 tonisives

I have the same issue.

db = Chroma(persist_directory='./db', embedding_function=OpenAIEmbeddings()) No embedding_function provided, using default embedding function: DefaultEmbeddingFunction https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

May 31 '23 04:05 h3clikejava

Any updates?

Jun 02 '23 19:06 sedgar03

I had a similar problem whereas I am using default embedding function of Chroma. After days of struggle, I found a partial solution. At first, I was using "from chromadb.utils import embedding_functions" to import SentenceTransformerEmbeddings, which produced the problem mentioned in the thread.

I happend to find a post which uses "from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings", a langchain package to get the embedding function and the problem is solved.

However, I want to use InstructorEmbeddingFunction recommened by Chroma, I am still looking for the solution.

Aug 17 '23 02:08 cfa532

I had a similar problem whereas I am using default embedding function of Chroma. After days of struggle, I found a partial solution. At first, I was using "from chromadb.utils import embedding_functions" to import SentenceTransformerEmbeddings, which produced the problem mentioned in the thread.

I happend to find a post which uses "from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings", a langchain package to get the embedding function and the problem is solved.

However, I want to use InstructorEmbeddingFunction recommened by Chroma, I am still looking for the solution.

facing the same issue, tried this import method but no luck

Nov 01 '23 10:11 sanjayporwal02

Is there any update on this? I am facing the same issue and can not use the OpenAIEmbeddingFunction as its dimensionality is 1536 and the default model is taken automatically (384 dimensional), even when creating chromadb like follows `openai_ef = embedding_functions.OpenAIEmbeddingFunction( model_name="text-embedding-ada-002" )

       collection = client.create_collection(name="leitlinineGPT",embedding_function=openai_ef ,metadata={"hnsw:space": "cosine"} ) # l2 is the default `

Nov 17 '23 13:11 J-Marlon-H

Facing this issue as well

Nov 17 '23 19:11 alyhafez95

Hi, @meal,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems that the issue involves the embedding function not being passed properly to Chroma when inserting documents using the rest API. Despite passing the OpenAIEmbeddings() function as the embedding parameter, the default SentenceTransformerEmbeddingFunction is being used instead. There have been attempts by several users to resolve this by specifying the embedding function on the Chroma client and trying different import methods for the embedding function. Additionally, there is a mention of a related issue with the list_collections logs.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

Feb 16 '24 16:02 dosubot[bot]

The issue still exists as of writing this comment:

import langchain
print(langchain.__version__)
>>> 0.1.9

import langchain_community 
print(langchain_community.__version__)
>>> 0.0.21

I encountered the issue when doing the following operation (I am using Ollama with mistral model):

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings

# load document from web using langchain_community.document_loaders.WebBaseLoader
# ...
split_web_document = text_splitter.split_documents(web_document)
embedding = OllamaEmbeddings(model="mistral", show_progress=True)
vector_store = Chroma.from_documents(split_web_document, embedding) # faulty line

No embedding_function provided, using default embedding function: DefaultEmbeddingFunction https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
OllamaEmbeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.60it/s]

I believe this issue should be re-opened.

Feb 24 '24 10:02 sislam-provenir

I have having this issue as well.

Feb 27 '24 22:02 vanessailana

Facing the same issue with OllamaEmbeddings("llama2")

Mar 16 '24 20:03 nitinnat

Are there any updates about this issue? I am still experienced this problem

May 11 '24 14:05 tnguyenqh

langchain langchain copied to clipboard

embedding function not passed properly to Chroma

langchain
langchain copied to clipboard