crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

[BUG] DirectorySearchTool fails to configure Ollama embedding model due to config format mismatch

Open benni82 opened this issue 2 months ago • 0 comments

Description

When configuring DirectorySearchTool with Ollama embedding provider, the configuration is incorrectly transformed from nested format to flat format, causing OllamaProvider to receive an empty dictionary instead of the expected configuration.

Error Message

1 validation error for DirectorySearchTool
EMBEDDINGS_OLLAMA_MODEL_NAME
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.12/v/missing

Steps to Reproduce

  1. Install Ollama and pull an embedding model:

    ollama pull nomic-embed-text
    
  2. Configure DirectorySearchTool with Ollama embedding:

    from crewai_tools import DirectorySearchTool
    
    config = {
        "embedding_model": {
            "provider": "ollama",
            "config": {
                "model": "nomic-embed-text",
                "url": "http://localhost:11434/api/embeddings"
            }
        }
    }
    
    tool = DirectorySearchTool(
        directory="/path/to/directory",
        config=config
    )
    
  3. The error occurs during tool initialization.

Expected behavior

The DirectorySearchTool should successfully initialize with Ollama embedding configuration, and the OllamaProvider should receive the configuration with model_name and url parameters.

Screenshots/Code snippets

The configuration is lost during the transformation process, and OllamaProvider receives an empty dictionary {}, causing a Pydantic validation error because the required model_name field is missing.

Operating System

macOS Sonoma

Python Version

3.10

crewAI Version

1.3.0

crewAI Tools Version

1.3.0

Virtual Environment

Venv

Evidence

Image

Possible Solution

Root Cause Analysis

The issue is in RagTool._create_embedding_function() method in lib/crewai-tools/src/crewai_tools/tools/rag/rag_tool.py.

Current Implementation (Line 125):

factory_config = {"provider": embedding_provider, **embedding_model_config}

This creates a flat configuration:

{
    "provider": "ollama",
    "model_name": "nomic-embed-text",
    "url": "http://localhost:11434/api/embeddings"
}

Expected Format by build_embedder_from_dict():

The build_embedder_from_dict() function in lib/crewai/src/crewai/rag/embeddings/factory.py (line 250) expects a nested configuration:

{
    "provider": "ollama",
    "config": {
        "model_name": "nomic-embed-text",
        "url": "http://localhost:11434/api/embeddings"
    }
}

The Problem:

  1. _create_embedding_function() creates flat config: {"provider": "ollama", "model_name": "...", "url": "..."}
  2. build_embedder_from_dict() calls spec.get("config", {}) expecting nested format
  3. Since there's no "config" key in the flat format, it returns empty dict {}
  4. OllamaProvider(**{}) is instantiated with no parameters
  5. Pydantic validation fails because model_name is required

Proposed Fix

Modify _create_embedding_function() in rag_tool.py to maintain the nested configuration format:

@staticmethod
def _create_embedding_function(embedding_config: dict, provider: str) -> Any:
    """Create embedding function for the specified vector database provider."""
    embedding_provider = embedding_config.get("provider")
    embedding_model_config = embedding_config.get("config", {}).copy()

    if "model" in embedding_model_config:
        embedding_model_config["model_name"] = embedding_model_config.pop("model")

    # Fix: Create nested format instead of flat format
    factory_config = {
        "provider": embedding_provider,
        "config": embedding_model_config  # Keep nested structure
    }

    if embedding_provider == "openai" and "api_key" not in embedding_model_config:
        api_key = os.getenv("OPENAI_API_KEY")
        if api_key:
            factory_config["config"]["api_key"] = api_key

    if provider == "chromadb":
        return get_embedding_function(factory_config)  # type: ignore[call-overload]

    if provider == "qdrant":
        chromadb_func = get_embedding_function(factory_config)  # type: ignore[call-overload]

        def qdrant_embed_fn(text: str) -> list[float]:
            """Embed text using ChromaDB function and convert to list of floats for Qdrant."""
            embeddings = chromadb_func([text])
            return embeddings[0] if embeddings and len(embeddings) > 0 else []

        return cast(Any, qdrant_embed_fn)

    return None

Affected Files

  • lib/crewai-tools/src/crewai_tools/tools/rag/rag_tool.py (line 125)
  • Potentially affects all embedding providers when used with DirectorySearchTool or other RagTool subclasses

Additional context

null

benni82 avatar Nov 03 '25 02:11 benni82