langchain Issue: Chroma DB

Issue you'd like to raise.

Hi,

for the following code there is a dimension error -> InvalidDimensionException: Dimensionality of (1536) does not match index dimensionality (384)

persist_directory = "chromaDB_csv"
vectordb = None

vectordb = Chroma.from_documents(
    documents=docs, embeddings=embeddingsFunction, persist_directory=persist_directory
)
vectordb.persist()


AzureOpenAI.api_type = "azure"
llm = AzureChatOpenAI(
        deployment_name="gpt-35-turbo",
        engine="gpt-35-turbo", 
        openai_api_base=os.getenv('OPENAI_API_BASE'),
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        openai_api_type = "azure",
        openai_api_version = "2023-03-15-preview"
    )

print(os.getenv('OPENAI_API_BASE'))
print(os.getenv('OPENAI_API_KEY'))

vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddingsFunction )
search_kwargs = {
    "maximal_marginal_relevance": True,
    "distance_metric": "cos",
    "fetch_k": 100,
    "k": 10,
}

retriever = vectordb.as_retriever(search_type="mmr", search_kwargs=search_kwargs)

chain = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    chain_type="stuff",
    verbose=True,
    max_tokens_limit=4096,
)

chain({"question": "ABC ABC ABC ABC", "chat_history":[]})```


### Suggestion:

_No response_

May 20 '23 20:05 FL0S0T

You need to set the embeddings dimension value in your vectorstore according to the embeddings function.

May 21 '23 03:05 preritdas

@preritdas i have never seen an example where you set it manually… can you provide an example please?

it looks like the error occurs since I loaded csv rows with different text length and does not split it… is this maybe the issue? Because I have build many chains like this and it always worked

May 21 '23 10:05 FL0S0T

What is your embeddingsFunction? You need to provide some

May 21 '23 16:05 meal

@meal the embeddings function looks like: embeddingsFunction = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)

-> I have copied the code again to another jupyter notebook and it worked without any issues :/ Same Code... Maybe some local cache issue?

May 21 '23 18:05 FL0S0T

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

May 23 '23 23:05 martinholecekmax

@martinholecekmax I think this is the solution... I don't know why the embeddings are created with different dimensions with the same code and input but it worked.

May 24 '23 06:05 FL0S0T

sorry but where is the folder on windows?

Aug 01 '23 13:08 qayyumabro

@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.

from chromadb.errors import InvalidDimensionException


try:
    docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
    Chroma().delete_collection()
    docsearch = Chroma.from_documents(documents=..., embedding=...)

Aug 03 '23 12:08 decadance-dance

Thanks @decadance-dance I found out that it was saving data in same folder where python script was lol.

Aug 03 '23 12:08 qayyumabro

@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.
from chromadb.errors import InvalidDimensionException


try:
    docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
    Chroma().delete_collection()
    docsearch = Chroma.from_documents(documents=..., embedding=...)

solved. Thanks

Sep 04 '23 07:09 wujianming1996

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

yes! that' the solution!

Dec 22 '23 13:12 LYCnight

I have the same issue, I removed the chroma db folder and tried but it still does not work. Here is my code

chroma_client = createChromaClient()
   #chroma_client.delete_collection(name=chroma_collection_name)
   chroma_collection = chroma_client.get_or_create_collection(name=chroma_collection_name)
   for doc in splits:
       chroma_collection.add(
           ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content
       )
   print(f"docs added to collection")
   db = Chroma(
     client=chroma_client,
     collection_name=chroma_collection_name,
     embedding_function=ch_embed
     )

Retreival :

   retriever = db.as_retriever(search_kwargs={"k": 2})
    qa_chain = RetrievalQA.from_chain_type( 
        llm=llm,
        retriever=retriever,
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )
    print(f"qa_chain--{qa_chain}")
    response = qa_chain({"query":question})
    return response["result"]

I get the error at line `response = qa_chain({"query":question})`

Can somebody please help to see what is wrong with above ?

Jan 02 '24 01:01 rbshah2488

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

Love you bro

Jan 24 '24 06:01 esakkiappan444

@meal the embeddings function looks like: embeddingsFunction = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)

-> I have copied the code again to another jupyter notebook and it worked without any issues :/ Same Code... Maybe some local cache issue?

Same happened with me.

Feb 29 '24 13:02 younesidsouguou

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

Love you bro

Fair explanation, thank you !

Feb 29 '24 13:02 younesidsouguou

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

🐐 It worked, thank you!!

Apr 29 '24 21:04 mujeeb-gh

@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.
from chromadb.errors import InvalidDimensionException


try:
    docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
    Chroma().delete_collection()
    docsearch = Chroma.from_documents(documents=..., embedding=...)

This works like a charm!

May 06 '24 09:05 nathnx

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

can confirm simple removal of stored db tables will clear this error; you had run embeddings previously using a different strategy than this one.

May 08 '24 05:05 yahyaghani

also, if you are doing some trial and error with your code, and don't want to delete the folder everytime, you could opt for not persisting the database. For example, istead of using such a line

vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db"),

you could use this,

vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings)

Jun 24 '24 00:06 GencerKoc

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

you save my life!!! thank you !!!!!!!!!!!!!

Jul 29 '24 09:07 johe-a

langchain langchain copied to clipboard

Issue: Chroma DB

Issue you'd like to raise.

langchain
langchain copied to clipboard