crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

[BUG] CrewAI ChromaDB Embedding Dimension Mismatch Issue

Open amdjedbens opened this issue 8 months ago • 6 comments

Description

When using CrewAI with knowledge sources, I'm encountering an embedding dimension mismatch error if I've previously used a different embedding model in the same project. This appears to happen because CrewAI uses ChromaDB as its default vector database, and ChromaDB enforces consistent embedding dimensions across operations.

[ERROR]: Embedding dimension mismatch. This usually happens when mixing different embedding models.
Try resetting the collection using `crewai reset-memories -a`

ValueError: Invalid Knowledge Configuration: Embedding dimension mismatch. Make sure you're using the same embedding model across all operations with this collection.
Try resetting the collection using `crewai reset-memories -a`

The issue shows up as a dimension mismatch error (e.g., 768 vs 1536) between current embeddings and previously stored embeddings.

Steps to Reproduce

  1. Create a CrewAI project with agents that use knowledge sources
  2. Run the project with one embedding model (e.g., OpenAI's model with 1536 dimensions)
  3. Change the embedding model to a different one (e.g., Ollama's nomic-embed-text with 768 dimensions)
  4. Run the project again without clearing previous embeddings

Expected behavior

The project should either:

  • Detect the embedding model change and automatically reset collections
  • Convert embeddings to be compatible
  • Provide a clearer error message with automated recovery

Current Behavior

The project fails with a cryptic ChromaDB error about dimension mismatch that is confusing since there's no clear indication that CrewAI is using ChromaDB under the hood.

I've tried running the suggested command crewai reset-memories -a but didn't work as well

Help Needed

Has anyone encountered this issue and found a reliable solution? I need a way to either:

  1. Properly reset the ChromaDB collections
  2. Configure CrewAI to use a different vector database
  3. Ensure consistent embedding dimensions across runs

Screenshots/Code snippets

# My embedding configuration
embedder_config = {
    "provider": "ollama",
    "config": {
        "model": "nomic-embed-text",
        "api_url": "http://localhost:11434",
    },
}

# Knowledge source initialization
data_knowledge_source = JSONKnowledgeSource(
    file_paths=["data_source.json"],
    embedder=embedder_config,
    collection_name=f"collection_{timestamp}"
)

# Create generic agents
data_analyst = Agent(
    role="Data Analyst",
    goal="Analyze data and extract insights",
    backstory="Experienced data analyst with expertise in pattern recognition",
    tools=[data_tool],
    knowledge_sources=[data_knowledge_source],
    verbose=False
)

report_writer = Agent(
    role="Report Writer",
    goal="Create comprehensive reports from data analysis",
    backstory="Expert in creating clear, actionable reports",
    tools=[data_tool],
    knowledge_sources=[data_knowledge_source],
    verbose=False
)

# Crew setup with knowledge sources
crew = Crew(
    agents=[data_analyst, report_writer],
    tasks=[analyze_task, report_task],
    process=Process.sequential,
    verbose=True,
    embedder=embedder_config,
    memory=True,
    short_term_memory=ShortTermMemory(
        storage=RAGStorage(
            embedder_config=embedder_config,
            type="short_term",
            path="db/memory.json"
        ),
    ),
    knowledge_sources=[data_knowledge_source],
)

# Reset attempt that doesn't work
crew.reset_memories(command_type="all")

### Operating System

macOS Sonoma

### Python Version

3.12

### crewAI Version

0.108.0

### crewAI Tools Version

0.38.1

### Virtual Environment

Venv

### Evidence

[2025-03-25 14:30:17][ERROR]: Embedding dimension mismatch. This usually happens when mixing different embedding models. Try resetting the collection using crewai reset-memories -a ╭─────────────────────────────────────────────────────────────────────────────── Crew Failure ───────────────────────────────────────────────────────────────────────────────╮ │ │ │ Crew Execution Failed │ │ Name: crew │ │ ID: d1616bf2-90bc-44c4-a32d-4042b318482b │ │ │ │ │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Traceback (most recent call last): File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 161, in save self.collection.upsert( File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/models/Collection.py", line 343, in upsert self._client._upsert( File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/init.py", line 150, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 103, in wrapper return self._rate_limit_enforcer.rate_limit(func)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/rate_limit/simple_rate_limit/init.py", line 23, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 536, in _upsert self._validate_embedding_record_set(coll, records_to_submit) File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/init.py", line 150, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 864, in _validate_embedding_record_set self._validate_dimension( File "/Users/.pyenv/versions/3.12.3/lib/python3.12/site-packages/chromadb/api/segment.py", line 881, in _validate_dimension raise InvalidDimensionException( chromadb.errors.InvalidDimensionException: Embedding dimension 768 does not match collection dimensionality 1536


### Possible Solution

None

### Additional context

This issue typically happens when:
1. Switching between embedding providers (OpenAI to local models or vice versa)
2. Changing embedding models within the same provider
3. Testing different configurations with the same codebase

amdjedbens avatar Mar 25 '25 13:03 amdjedbens

Hi @amdjedbens, the crewai reset-memories -a command will get fixed soon.

checkout this PR : #2312 The changes are very less, you can try to pull these changes, while this PR gets merged. Let me know if this works.

Vidit-Ostwal avatar Mar 25 '25 16:03 Vidit-Ostwal

@Vidit-Ostwal so can we close this PR

lucasgomide avatar Mar 26 '25 14:03 lucasgomide

Hi @lucasgomide, yes I think we can close this PR.

Vidit-Ostwal avatar Mar 26 '25 15:03 Vidit-Ostwal

@Vidit-Ostwal can you mention on #2312 that it solves this issue for tracking purposes?

lucasgomide avatar Mar 26 '25 15:03 lucasgomide

Following up on this issue: The CLI command crewai reset-memories -a is still not working for me, even after updating to the latest (PR #2312).

Since I'm using macOS, I had to take the following steps to resolve the problem:

If you're changing the embedder model and encounter the error [ERROR]: Embedding dimension mismatch, you'll need to reset the memory. However, because the CLI tool command crewai reset-memories -a isn't functional for me, I had to manually delete the SQLite database that's used for storing memories.

Location of the database file:

/Users/your-user-name/Library/Application Support/name-of-your-project You'll need to navigate to this location manually and delete the file yourself.

I hope this workaround helps anyone facing the same problem!

GabrielBoninUnity avatar Apr 19 '25 16:04 GabrielBoninUnity

@amdjedbens, Can you update the crewai version and check whether this is working or not?

Vidit-Ostwal avatar Apr 29 '25 05:04 Vidit-Ostwal

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 29 '25 12:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jun 03 '25 12:06 github-actions[bot]