llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

feat(vector_io): add custom collection names support for vector stores

Open r-bit-rry opened this issue 1 month ago • 0 comments

What does this PR do?

This PR fixes an issue where custom collection names for vector stores were not being correctly utilized by the underlying storage providers, resulting in a mismatch between the logical API identifier (UUID) and the physical storage identifier.

Specifically, it:

  1. Updates the VectorIORouter to pass the canonical vector_store_id (UUID) to the provider in model_extra.
  2. Updates OpenAIVectorStoreMixin to use the provided vector_store_id as the logical identifier for the VectorStore resource, while retaining provider_vector_store_id (custom name) as the provider_resource_id.
  3. Updates inline providers (sqlite_vec, faiss) to use the provider_resource_id (if available) as the physical storage identifier (e.g., table name, bank ID).

This ensures that when a user specifies a collection_name, it is used for the physical storage while the API continues to return the expected UUID format, resolving the discrepancy and ensuring correct routing and storage.

Closes #4135

Test Plan

  • Added 3 comprehensive integration tests to tests/integration/vector_io/test_openai_vector_stores.py:

    • test_openai_vector_store_custom_collection_name: Validates custom collection name creation and metadata storage
    • test_openai_vector_store_collection_name_validation: Validates input sanitization (alphanumeric, hyphens, underscores only)
    • test_openai_vector_store_collection_name_with_data: Validates data insertion and search operations with custom collection names
  • Verified ID synchronization across all layers:

    • Client receives UUID (vs_abc123)
    • Router maps UUID to custom collection name
    • Provider uses UUID for routing, custom name for physical storage
    • Physical storage (SQLite, FAISS, etc.) uses custom collection name
  • Ensured backward compatibility: existing code without collection_name continues to use auto-generated UUIDs

  • Tested with ollama, nomic-embed-text:latest, sqlite

The changes resolve the ID synchronization issue and enable users to specify meaningful collection names for easier management of vector stores.

r-bit-rry avatar Nov 20 '25 17:11 r-bit-rry