feat(vector_io): add custom collection names support for vector stores
What does this PR do?
This PR fixes an issue where custom collection names for vector stores were not being correctly utilized by the underlying storage providers, resulting in a mismatch between the logical API identifier (UUID) and the physical storage identifier.
Specifically, it:
- Updates the
VectorIORouterto pass the canonicalvector_store_id(UUID) to the provider inmodel_extra. - Updates
OpenAIVectorStoreMixinto use the providedvector_store_idas the logical identifier for theVectorStoreresource, while retainingprovider_vector_store_id(custom name) as theprovider_resource_id. - Updates inline providers (
sqlite_vec,faiss) to use theprovider_resource_id(if available) as the physical storage identifier (e.g., table name, bank ID).
This ensures that when a user specifies a collection_name, it is used for the physical storage while the API continues to return the expected UUID format, resolving the discrepancy and ensuring correct routing and storage.
Closes #4135
Test Plan
-
Added 3 comprehensive integration tests to
tests/integration/vector_io/test_openai_vector_stores.py:-
test_openai_vector_store_custom_collection_name: Validates custom collection name creation and metadata storage -
test_openai_vector_store_collection_name_validation: Validates input sanitization (alphanumeric, hyphens, underscores only) -
test_openai_vector_store_collection_name_with_data: Validates data insertion and search operations with custom collection names
-
-
Verified ID synchronization across all layers:
- Client receives UUID (
vs_abc123) - Router maps UUID to custom collection name
- Provider uses UUID for routing, custom name for physical storage
- Physical storage (SQLite, FAISS, etc.) uses custom collection name
- Client receives UUID (
-
Ensured backward compatibility: existing code without
collection_namecontinues to use auto-generated UUIDs -
Tested with ollama, nomic-embed-text:latest, sqlite
The changes resolve the ID synchronization issue and enable users to specify meaningful collection names for easier management of vector stores.