langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Issue: Chroma DB

Open FL0S0T opened this issue 1 year ago • 4 comments

Issue you'd like to raise.

Hi,

for the following code there is a dimension error -> InvalidDimensionException: Dimensionality of (1536) does not match index dimensionality (384)

persist_directory = "chromaDB_csv"
vectordb = None

vectordb = Chroma.from_documents(
    documents=docs, embeddings=embeddingsFunction, persist_directory=persist_directory
)
vectordb.persist()


AzureOpenAI.api_type = "azure"
llm = AzureChatOpenAI(
        deployment_name="gpt-35-turbo",
        engine="gpt-35-turbo", 
        openai_api_base=os.getenv('OPENAI_API_BASE'),
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        openai_api_type = "azure",
        openai_api_version = "2023-03-15-preview"
    )

print(os.getenv('OPENAI_API_BASE'))
print(os.getenv('OPENAI_API_KEY'))

vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddingsFunction )
search_kwargs = {
    "maximal_marginal_relevance": True,
    "distance_metric": "cos",
    "fetch_k": 100,
    "k": 10,
}

retriever = vectordb.as_retriever(search_type="mmr", search_kwargs=search_kwargs)

chain = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    chain_type="stuff",
    verbose=True,
    max_tokens_limit=4096,
)

chain({"question": "ABC ABC ABC ABC", "chat_history":[]})```


### Suggestion:

_No response_

FL0S0T avatar May 20 '23 20:05 FL0S0T

You need to set the embeddings dimension value in your vectorstore according to the embeddings function.

preritdas avatar May 21 '23 03:05 preritdas

@preritdas i have never seen an example where you set it manually… can you provide an example please?

it looks like the error occurs since I loaded csv rows with different text length and does not split it… is this maybe the issue? Because I have build many chains like this and it always worked

FL0S0T avatar May 21 '23 10:05 FL0S0T

What is your embeddingsFunction? You need to provide some

meal avatar May 21 '23 16:05 meal

@meal the embeddings function looks like: embeddingsFunction = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)

-> I have copied the code again to another jupyter notebook and it worked without any issues :/ Same Code... Maybe some local cache issue?

FL0S0T avatar May 21 '23 18:05 FL0S0T

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

martinholecekmax avatar May 23 '23 23:05 martinholecekmax

@martinholecekmax I think this is the solution... I don't know why the embeddings are created with different dimensions with the same code and input but it worked.

FL0S0T avatar May 24 '23 06:05 FL0S0T

sorry but where is the folder on windows?

qayyumabro avatar Aug 01 '23 13:08 qayyumabro

@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.

from chromadb.errors import InvalidDimensionException


try:
    docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
    Chroma().delete_collection()
    docsearch = Chroma.from_documents(documents=..., embedding=...)

decadance-dance avatar Aug 03 '23 12:08 decadance-dance

Thanks @decadance-dance I found out that it was saving data in same folder where python script was lol.

qayyumabro avatar Aug 03 '23 12:08 qayyumabro

@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.

from chromadb.errors import InvalidDimensionException


try:
    docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
    Chroma().delete_collection()
    docsearch = Chroma.from_documents(documents=..., embedding=...)

solved. Thanks

wujianming1996 avatar Sep 04 '23 07:09 wujianming1996

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

yes! that' the solution!

LYCnight avatar Dec 22 '23 13:12 LYCnight

I have the same issue, I removed the chroma db folder and tried but it still does not work. Here is my code

chroma_client = createChromaClient()
   #chroma_client.delete_collection(name=chroma_collection_name)
   chroma_collection = chroma_client.get_or_create_collection(name=chroma_collection_name)
   for doc in splits:
       chroma_collection.add(
           ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content
       )
   print(f"docs added to collection")
   db = Chroma(
     client=chroma_client,
     collection_name=chroma_collection_name,
     embedding_function=ch_embed
     )

Retreival :

   retriever = db.as_retriever(search_kwargs={"k": 2})
    qa_chain = RetrievalQA.from_chain_type( 
        llm=llm,
        retriever=retriever,
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )
    print(f"qa_chain--{qa_chain}")
    response = qa_chain({"query":question})
    return response["result"]
I get the error at line `response = qa_chain({"query":question})`

Can somebody please help to see what is wrong with above ?

rbshah2488 avatar Jan 02 '24 01:01 rbshah2488

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

Love you bro

esakkiappan444 avatar Jan 24 '24 06:01 esakkiappan444

@meal the embeddings function looks like: embeddingsFunction = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)

-> I have copied the code again to another jupyter notebook and it worked without any issues :/ Same Code... Maybe some local cache issue?

Same happened with me.

younesidsouguou avatar Feb 29 '24 13:02 younesidsouguou

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

Love you bro

Fair explanation, thank you !

younesidsouguou avatar Feb 29 '24 13:02 younesidsouguou

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

🐐 It worked, thank you!!

mujeeb-gh avatar Apr 29 '24 21:04 mujeeb-gh

@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.

from chromadb.errors import InvalidDimensionException


try:
    docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
    Chroma().delete_collection()
    docsearch = Chroma.from_documents(documents=..., embedding=...)

This works like a charm!

nathnx avatar May 06 '24 09:05 nathnx

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

can confirm simple removal of stored db tables will clear this error; you had run embeddings previously using a different strategy than this one.

yahyaghani avatar May 08 '24 05:05 yahyaghani

also, if you are doing some trial and error with your code, and don't want to delete the folder everytime, you could opt for not persisting the database. For example, istead of using such a line

vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db"),

you could use this,

vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings)

GencerKoc avatar Jun 24 '24 00:06 GencerKoc

I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.

you save my life!!! thank you !!!!!!!!!!!!!

johe-a avatar Jul 29 '24 09:07 johe-a