langchain
langchain copied to clipboard
Issue: Chroma DB
Issue you'd like to raise.
Hi,
for the following code there is a dimension error -> InvalidDimensionException: Dimensionality of (1536) does not match index dimensionality (384)
persist_directory = "chromaDB_csv"
vectordb = None
vectordb = Chroma.from_documents(
documents=docs, embeddings=embeddingsFunction, persist_directory=persist_directory
)
vectordb.persist()
AzureOpenAI.api_type = "azure"
llm = AzureChatOpenAI(
deployment_name="gpt-35-turbo",
engine="gpt-35-turbo",
openai_api_base=os.getenv('OPENAI_API_BASE'),
openai_api_key=os.getenv("OPENAI_API_KEY"),
openai_api_type = "azure",
openai_api_version = "2023-03-15-preview"
)
print(os.getenv('OPENAI_API_BASE'))
print(os.getenv('OPENAI_API_KEY'))
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddingsFunction )
search_kwargs = {
"maximal_marginal_relevance": True,
"distance_metric": "cos",
"fetch_k": 100,
"k": 10,
}
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs=search_kwargs)
chain = ConversationalRetrievalChain.from_llm(
llm,
retriever=retriever,
chain_type="stuff",
verbose=True,
max_tokens_limit=4096,
)
chain({"question": "ABC ABC ABC ABC", "chat_history":[]})```
### Suggestion:
_No response_
You need to set the embeddings dimension value in your vectorstore according to the embeddings function.
@preritdas i have never seen an example where you set it manually… can you provide an example please?
it looks like the error occurs since I loaded csv rows with different text length and does not split it… is this maybe the issue? Because I have build many chains like this and it always worked
What is your embeddingsFunction
? You need to provide some
@meal the embeddings function looks like: embeddingsFunction = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)
-> I have copied the code again to another jupyter notebook and it worked without any issues :/ Same Code... Maybe some local cache issue?
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
@martinholecekmax I think this is the solution... I don't know why the embeddings are created with different dimensions with the same code and input but it worked.
sorry but where is the folder on windows?
@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.
from chromadb.errors import InvalidDimensionException
try:
docsearch = Chroma.from_documents(documents=..., embedding=...)
except InvalidDimensionException:
Chroma().delete_collection()
docsearch = Chroma.from_documents(documents=..., embedding=...)
Thanks @decadance-dance I found out that it was saving data in same folder where python script was lol.
@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.
from chromadb.errors import InvalidDimensionException try: docsearch = Chroma.from_documents(documents=..., embedding=...) except InvalidDimensionException: Chroma().delete_collection() docsearch = Chroma.from_documents(documents=..., embedding=...)
solved. Thanks
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
yes! that' the solution!
I have the same issue, I removed the chroma db folder and tried but it still does not work. Here is my code
chroma_client = createChromaClient()
#chroma_client.delete_collection(name=chroma_collection_name)
chroma_collection = chroma_client.get_or_create_collection(name=chroma_collection_name)
for doc in splits:
chroma_collection.add(
ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content
)
print(f"docs added to collection")
db = Chroma(
client=chroma_client,
collection_name=chroma_collection_name,
embedding_function=ch_embed
)
Retreival :
retriever = db.as_retriever(search_kwargs={"k": 2})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
print(f"qa_chain--{qa_chain}")
response = qa_chain({"query":question})
return response["result"]
I get the error at line `response = qa_chain({"query":question})`
Can somebody please help to see what is wrong with above ?
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
Love you bro
@meal the embeddings function looks like: embeddingsFunction = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)
-> I have copied the code again to another jupyter notebook and it worked without any issues :/ Same Code... Maybe some local cache issue?
Same happened with me.
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
Love you bro
Fair explanation, thank you !
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
🐐 It worked, thank you!!
@qayyumabro If you can't find the directory where your index/collection is stored in order to remove it, you can use a workaround that works for me.
from chromadb.errors import InvalidDimensionException try: docsearch = Chroma.from_documents(documents=..., embedding=...) except InvalidDimensionException: Chroma().delete_collection() docsearch = Chroma.from_documents(documents=..., embedding=...)
This works like a charm!
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
can confirm simple removal of stored db tables will clear this error; you had run embeddings previously using a different strategy than this one.
also, if you are doing some trial and error with your code, and don't want to delete the folder everytime, you could opt for not persisting the database. For example, istead of using such a line
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db"),
you could use this,
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings)
I had the same issue before. This is probably caused by having the embeddings with different dimensions already stored inside the chroma db. I fixed that by removing the chroma db folder which contains the stored embeddings.
you save my life!!! thank you !!!!!!!!!!!!!