langchain icon indicating copy to clipboard operation
langchain copied to clipboard

embedding function not passed properly to Chroma

Open meal opened this issue 1 year ago • 8 comments

Hi, I'm running official docker image from Chroma and using it via rest API (I need it in server mode for persistent storage in production deployment)

When inserting documents (I'm loading pdfs) I'm getting

chromadb.api.models.Collection No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
even though I'm passing OpenAIEmbeddings() as embedding parameter

embeddings = OpenAIEmbeddings()
    chroma_settings = Settings(
        chroma_api_impl="rest",
        chroma_server_host="localhost",
        chroma_server_http_port=8000,
        anonymized_telemetry=False,
    )

    
    loader = PyPDFLoader(pdf_url)
    pages = loader.load_and_split()
    Chroma.from_documents(
        documents=pages, embedding=embeddings, client_settings=chroma_settings
    )

meal avatar Apr 16 '23 15:04 meal

in your definition to OpenAIEmbeddings, you need to specify an embeddings model name model=your_embedding_deployment_name

skeretna avatar Apr 17 '23 01:04 skeretna

in your definition to OpenAIEmbeddings, you need to specify an embeddings model name model=your_embedding_deployment_name

that's not the case, I've tried that as well

meal avatar Apr 17 '23 06:04 meal

I am experience the same issue, I tried specifying the embedding function on the chroma client as well, but still the same issue

baseplate77 avatar May 02 '23 16:05 baseplate77

I'm seeing the same issue.

akeybl avatar May 13 '23 03:05 akeybl

I have the same issue. The embedding function is defined and was running fine before I dockerized chromadb.

mlorenzon avatar May 14 '23 10:05 mlorenzon

chromadb has an issue where it's list_collections logs this error while it shouldn't. https://github.com/chroma-core/chroma/issues/484

tonisives avatar May 15 '23 01:05 tonisives

I have the same issue.

db = Chroma(persist_directory='./db', embedding_function=OpenAIEmbeddings()) No embedding_function provided, using default embedding function: DefaultEmbeddingFunction https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

h3clikejava avatar May 31 '23 04:05 h3clikejava

Any updates?

sedgar03 avatar Jun 02 '23 19:06 sedgar03

I had a similar problem whereas I am using default embedding function of Chroma. After days of struggle, I found a partial solution. At first, I was using "from chromadb.utils import embedding_functions" to import SentenceTransformerEmbeddings, which produced the problem mentioned in the thread.

I happend to find a post which uses "from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings", a langchain package to get the embedding function and the problem is solved.

However, I want to use InstructorEmbeddingFunction recommened by Chroma, I am still looking for the solution.

cfa532 avatar Aug 17 '23 02:08 cfa532

I had a similar problem whereas I am using default embedding function of Chroma. After days of struggle, I found a partial solution. At first, I was using "from chromadb.utils import embedding_functions" to import SentenceTransformerEmbeddings, which produced the problem mentioned in the thread.

I happend to find a post which uses "from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings", a langchain package to get the embedding function and the problem is solved.

However, I want to use InstructorEmbeddingFunction recommened by Chroma, I am still looking for the solution.

facing the same issue, tried this import method but no luck

sanjayporwal02 avatar Nov 01 '23 10:11 sanjayporwal02

Is there any update on this? I am facing the same issue and can not use the OpenAIEmbeddingFunction as its dimensionality is 1536 and the default model is taken automatically (384 dimensional), even when creating chromadb like follows `openai_ef = embedding_functions.OpenAIEmbeddingFunction( model_name="text-embedding-ada-002" )

       collection = client.create_collection(name="leitlinineGPT",embedding_function=openai_ef ,metadata={"hnsw:space": "cosine"} ) # l2 is the default `

J-Marlon-H avatar Nov 17 '23 13:11 J-Marlon-H

Facing this issue as well

alyhafez95 avatar Nov 17 '23 19:11 alyhafez95

Hi, @meal,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems that the issue involves the embedding function not being passed properly to Chroma when inserting documents using the rest API. Despite passing the OpenAIEmbeddings() function as the embedding parameter, the default SentenceTransformerEmbeddingFunction is being used instead. There have been attempts by several users to resolve this by specifying the embedding function on the Chroma client and trying different import methods for the embedding function. Additionally, there is a mention of a related issue with the list_collections logs.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

dosubot[bot] avatar Feb 16 '24 16:02 dosubot[bot]

The issue still exists as of writing this comment:

import langchain
print(langchain.__version__)
>>> 0.1.9

import langchain_community 
print(langchain_community.__version__)
>>> 0.0.21

I encountered the issue when doing the following operation (I am using Ollama with mistral model):

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings

# load document from web using langchain_community.document_loaders.WebBaseLoader
# ...
split_web_document = text_splitter.split_documents(web_document)
embedding = OllamaEmbeddings(model="mistral", show_progress=True)
vector_store = Chroma.from_documents(split_web_document, embedding) # faulty line 
No embedding_function provided, using default embedding function: DefaultEmbeddingFunction https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
OllamaEmbeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.60it/s]

I believe this issue should be re-opened.

pai-sameen avatar Feb 24 '24 10:02 pai-sameen

I have having this issue as well.

vanessailana avatar Feb 27 '24 22:02 vanessailana

Facing the same issue with OllamaEmbeddings("llama2")

nitinnat avatar Mar 16 '24 20:03 nitinnat