llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Pydantic Validation Error in VectorStoreSearchResponse with List-Type Metadata

Open zanetworker opened this issue 3 months ago • 1 comments

System Info

  • Llama Stack Version: 0.2.23
  • Python Version: 3.11
  • Vector Store Backend: FAISS (affects all backends)

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

🐛 Describe the bug

Vector store search operations fail with Pydantic validation errors when chunk metadata contains list-type values (e.g., tags). This occurs because the VectorStoreSearchResponse model restricts attributes to only primitive types (str | float | bool), while the input Chunk.metadata accepts any type (dict[str, Any]).

Environment

  • Llama Stack Version: [Your version]
  • Python Version: 3.11
  • Vector Store Backend: FAISS (affects all backends)

Steps to Reproduce

  1. Ingest documents with metadata containing lists:
chunks = [
    Chunk(
        content="Model information...",
        metadata={
            "tags": ["transformers", "h100-compatible", "region:us"],
            "model_name": "granite-3.3-8b"
        }
    )
]
await vector_io.insert_chunks(vector_db_id, chunks)
  1. Search the vector store:
response = await vector_io.openai_search_vector_store(
    vector_store_id="vs_...",
    query="models compatible with H100 GPU",
    max_num_results=10
)

Actual Behavior

The search returns empty results and logs show a Pydantic validation error:

[ERROR] Error searching vector store vs_6ce6f6c8-09b9-4e54-a4c5-3a78f7688805: 6 validation errors for VectorStoreSearchResponse
attributes.tags.str
  Input should be a valid string [type=string_type, input_value=['transformers', 'safeten...ompatible', 'region:us'], input_type=list]
attributes.tags.float
  Input should be a valid number [type=float_type, input_value=['transformers', 'safeten...ompatible', 'region:us'], input_type=list]
attributes.tags.bool
  Input should be a valid boolean [type=bool_type, input_value=['transformers', 'safeten...ompatible', 'region:us'], input_type=list]

Expected Behavior

The search should return results with metadata intact, supporting the same flexible metadata types at retrieval that are accepted at ingestion.

Root Cause

Schema Mismatch:

Input Schema (llama_stack/apis/vector_io/vector_io.py:71):

class Chunk(BaseModel):
    metadata: dict[str, Any] = Field(default_factory=dict)  # ✅ Accepts ANY type

Output Schema (llama_stack/apis/vector_io/vector_io.py:250):

class VectorStoreSearchResponse(BaseModel):
    attributes: dict[str, str | float | bool] | None = None  # ❌ Only primitives

Direct Pass-through (llama_stack/providers/utils/memory/openai_vector_store_mixin.py:606):

response_data_item = VectorStoreSearchResponse(
    file_id=chunk.metadata.get("document_id", ""),
    filename=chunk.metadata.get("filename", ""),
    score=score,
    attributes=chunk.metadata,  # ❌ No transformation/validation
    content=content,
)

Error logs

[ERROR] Error searching vector store
         vs_6ce6f6c8-09b9-4e54-a4c5-3a78f7688805: 6 validation errors for VectorStoreSearchResponse
         attributes.tags.str
           Input should be a valid string [type=string_type, input_value=['transformers', 'safeten...ompatible',
         'region:us'], input_type=list]
             For further information visit https://errors.pydantic.dev/2.12/v/string_type
         attributes.tags.float
           Input should be a valid number [type=float_type, input_value=['transformers', 'safeten...ompatible',
         'region:us'], input_type=list]
             For further information visit https://errors.pydantic.dev/2.12/v/float_type
         attributes.tags.bool
           Input should be a valid boolean [type=bool_type, input_value=['transformers', 'safeten...ompatible',
         'region:us'], input_type=list]
             For further information visit https://errors.pydantic.dev/2.12/v/bool_type
         attributes.last_modified.str
           Input should be a valid string
             For further information visit https://errors.pydantic.dev/2.12/v/string_type
         attributes.last_modified.float
           Input should be a valid number
             For further information visit https://errors.pydantic.dev/2.12/v/float_type
         attributes.last_modified.bool
           Input should be a valid boolean
             For further information visit https://errors.pydantic.dev/2.12/v/bool_type
INFO     2025-10-12 18:49:26,916 console_span_processor:39 telemetry: 16:49:26.915 [END]

Expected behavior

No Error, search works.

zanetworker avatar Oct 12 '25 17:10 zanetworker

Additionally to the fix itself as part of the PR. the user can provide the following as argument {"tags": "tag0,tag1"}, later to be split again by the user output.split(',')

r-bit-rry avatar Nov 18 '25 15:11 r-bit-rry

Fixed by https://github.com/llamastack/llama-stack/pull/4173?

nathan-weinberg avatar Dec 03 '25 16:12 nathan-weinberg