crewAI
crewAI copied to clipboard
[BUG] CrewAI ChromaDB Embedding Dimension Mismatch Issue
Description
When using CrewAI with knowledge sources, I'm encountering an embedding dimension mismatch error if I've previously used a different embedding model in the same project. This appears to happen because CrewAI uses ChromaDB as its default vector database, and ChromaDB enforces consistent embedding dimensions across operations.
[ERROR]: Embedding dimension mismatch. This usually happens when mixing different embedding models.
Try resetting the collection using `crewai reset-memories -a`
ValueError: Invalid Knowledge Configuration: Embedding dimension mismatch. Make sure you're using the same embedding model across all operations with this collection.
Try resetting the collection using `crewai reset-memories -a`
The issue shows up as a dimension mismatch error (e.g., 768 vs 1536) between current embeddings and previously stored embeddings.
Steps to Reproduce
- Create a CrewAI project with agents that use knowledge sources
- Run the project with one embedding model (e.g., OpenAI's model with 1536 dimensions)
- Change the embedding model to a different one (e.g., Ollama's nomic-embed-text with 768 dimensions)
- Run the project again without clearing previous embeddings
Expected behavior
The project should either:
- Detect the embedding model change and automatically reset collections
- Convert embeddings to be compatible
- Provide a clearer error message with automated recovery
Current Behavior
The project fails with a cryptic ChromaDB error about dimension mismatch that is confusing since there's no clear indication that CrewAI is using ChromaDB under the hood.
I've tried running the suggested command crewai reset-memories -a but didn't work as well
Help Needed
Has anyone encountered this issue and found a reliable solution? I need a way to either:
- Properly reset the ChromaDB collections
- Configure CrewAI to use a different vector database
- Ensure consistent embedding dimensions across runs
Screenshots/Code snippets
# My embedding configuration
embedder_config = {
"provider": "ollama",
"config": {
"model": "nomic-embed-text",
"api_url": "http://localhost:11434",
},
}
# Knowledge source initialization
data_knowledge_source = JSONKnowledgeSource(
file_paths=["data_source.json"],
embedder=embedder_config,
collection_name=f"collection_{timestamp}"
)
# Create generic agents
data_analyst = Agent(
role="Data Analyst",
goal="Analyze data and extract insights",
backstory="Experienced data analyst with expertise in pattern recognition",
tools=[data_tool],
knowledge_sources=[data_knowledge_source],
verbose=False
)
report_writer = Agent(
role="Report Writer",
goal="Create comprehensive reports from data analysis",
backstory="Expert in creating clear, actionable reports",
tools=[data_tool],
knowledge_sources=[data_knowledge_source],
verbose=False
)
# Crew setup with knowledge sources
crew = Crew(
agents=[data_analyst, report_writer],
tasks=[analyze_task, report_task],
process=Process.sequential,
verbose=True,
embedder=embedder_config,
memory=True,
short_term_memory=ShortTermMemory(
storage=RAGStorage(
embedder_config=embedder_config,
type="short_term",
path="db/memory.json"
),
),
knowledge_sources=[data_knowledge_source],
)
# Reset attempt that doesn't work
crew.reset_memories(command_type="all")
### Operating System
macOS Sonoma
### Python Version
3.12
### crewAI Version
0.108.0
### crewAI Tools Version
0.38.1
### Virtual Environment
Venv
### Evidence
[2025-03-25 14:30:17][ERROR]: Embedding dimension mismatch. This usually happens when mixing different embedding models. Try resetting the collection using crewai reset-memories -a
╭─────────────────────────────────────────────────────────────────────────────── Crew Failure ───────────────────────────────────────────────────────────────────────────────╮
│ │
│ Crew Execution Failed │
│ Name: crew │
│ ID: d1616bf2-90bc-44c4-a32d-4042b318482b │
│ │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Traceback (most recent call last): File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 161, in save self.collection.upsert( File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/models/Collection.py", line 343, in upsert self._client._upsert( File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/init.py", line 150, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 103, in wrapper return self._rate_limit_enforcer.rate_limit(func)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/rate_limit/simple_rate_limit/init.py", line 23, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 536, in _upsert self._validate_embedding_record_set(coll, records_to_submit) File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/init.py", line 150, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 864, in _validate_embedding_record_set self._validate_dimension( File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 881, in _validate_dimension raise InvalidDimensionException( chromadb.errors.InvalidDimensionException: Embedding dimension 768 does not match collection dimensionality 1536
### Possible Solution
None
### Additional context
This issue typically happens when:
1. Switching between embedding providers (OpenAI to local models or vice versa)
2. Changing embedding models within the same provider
3. Testing different configurations with the same codebase
Hi @amdjedbens,
the crewai reset-memories -a command will get fixed soon.
checkout this PR : #2312 The changes are very less, you can try to pull these changes, while this PR gets merged. Let me know if this works.
@Vidit-Ostwal so can we close this PR
Hi @lucasgomide, yes I think we can close this PR.
@Vidit-Ostwal can you mention on #2312 that it solves this issue for tracking purposes?
Following up on this issue: The CLI command crewai reset-memories -a is still not working for me, even after updating to the latest (PR #2312).
Since I'm using macOS, I had to take the following steps to resolve the problem:
If you're changing the embedder model and encounter the error [ERROR]: Embedding dimension mismatch, you'll need to reset the memory. However, because the CLI tool command crewai reset-memories -a isn't functional for me, I had to manually delete the SQLite database that's used for storing memories.
Location of the database file:
/Users/your-user-name/Library/Application Support/name-of-your-project You'll need to navigate to this location manually and delete the file yourself.
I hope this workaround helps anyone facing the same problem!
@amdjedbens, Can you update the crewai version and check whether this is working or not?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.