langchain
langchain copied to clipboard
embedding function not passed properly to Chroma
Hi, I'm running official docker image from Chroma and using it via rest API (I need it in server mode for persistent storage in production deployment)
When inserting documents (I'm loading pdfs) I'm getting
chromadb.api.models.Collection No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
even though I'm passing OpenAIEmbeddings() as embedding parameter
embeddings = OpenAIEmbeddings()
chroma_settings = Settings(
chroma_api_impl="rest",
chroma_server_host="localhost",
chroma_server_http_port=8000,
anonymized_telemetry=False,
)
loader = PyPDFLoader(pdf_url)
pages = loader.load_and_split()
Chroma.from_documents(
documents=pages, embedding=embeddings, client_settings=chroma_settings
)
in your definition to OpenAIEmbeddings, you need to specify an embeddings model name model=your_embedding_deployment_name
in your definition to OpenAIEmbeddings, you need to specify an embeddings model name model=your_embedding_deployment_name
that's not the case, I've tried that as well
I am experience the same issue, I tried specifying the embedding function on the chroma client as well, but still the same issue
I'm seeing the same issue.
I have the same issue. The embedding function is defined and was running fine before I dockerized chromadb.
chromadb has an issue where it's list_collections
logs this error while it shouldn't. https://github.com/chroma-core/chroma/issues/484
I have the same issue.
db = Chroma(persist_directory='./db', embedding_function=OpenAIEmbeddings()) No embedding_function provided, using default embedding function: DefaultEmbeddingFunction https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
Any updates?
I had a similar problem whereas I am using default embedding function of Chroma. After days of struggle, I found a partial solution. At first, I was using "from chromadb.utils import embedding_functions" to import SentenceTransformerEmbeddings, which produced the problem mentioned in the thread.
I happend to find a post which uses "from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings", a langchain package to get the embedding function and the problem is solved.
However, I want to use InstructorEmbeddingFunction recommened by Chroma, I am still looking for the solution.
I had a similar problem whereas I am using default embedding function of Chroma. After days of struggle, I found a partial solution. At first, I was using "from chromadb.utils import embedding_functions" to import SentenceTransformerEmbeddings, which produced the problem mentioned in the thread.
I happend to find a post which uses "from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings", a langchain package to get the embedding function and the problem is solved.
However, I want to use InstructorEmbeddingFunction recommened by Chroma, I am still looking for the solution.
facing the same issue, tried this import method but no luck
Is there any update on this? I am facing the same issue and can not use the OpenAIEmbeddingFunction as its dimensionality is 1536 and the default model is taken automatically (384 dimensional), even when creating chromadb like follows `openai_ef = embedding_functions.OpenAIEmbeddingFunction( model_name="text-embedding-ada-002" )
collection = client.create_collection(name="leitlinineGPT",embedding_function=openai_ef ,metadata={"hnsw:space": "cosine"} ) # l2 is the default `
Facing this issue as well
Hi, @meal,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems that the issue involves the embedding function not being passed properly to Chroma when inserting documents using the rest API. Despite passing the OpenAIEmbeddings() function as the embedding parameter, the default SentenceTransformerEmbeddingFunction is being used instead. There have been attempts by several users to resolve this by specifying the embedding function on the Chroma client and trying different import methods for the embedding function. Additionally, there is a mention of a related issue with the list_collections
logs.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!
The issue still exists as of writing this comment:
import langchain
print(langchain.__version__)
>>> 0.1.9
import langchain_community
print(langchain_community.__version__)
>>> 0.0.21
I encountered the issue when doing the following operation (I am using Ollama with mistral
model):
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
# load document from web using langchain_community.document_loaders.WebBaseLoader
# ...
split_web_document = text_splitter.split_documents(web_document)
embedding = OllamaEmbeddings(model="mistral", show_progress=True)
vector_store = Chroma.from_documents(split_web_document, embedding) # faulty line
No embedding_function provided, using default embedding function: DefaultEmbeddingFunction https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
OllamaEmbeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 2.60it/s]
I believe this issue should be re-opened.
I have having this issue as well.
Facing the same issue with OllamaEmbeddings("llama2")
Are there any updates about this issue? I am still experienced this problem