spring-ai
spring-ai copied to clipboard
Add support for 'halfvec' for vector embeddings and distance functions in PgVectorStore
Expected Behavior
When using the PgVector vector database (PgVectorStore) functionality I was hoping to use the halfvec type due to requirements needed from my local model.
Current Behavior
Currently only the vector type and distance functions supporting said type are supported
Context
Using the newer gemma models their embedded vector lengths are upward of 2000 (which is the limit of the vector column type which is used thoughout the org.springframework.ai.vectorstore.pgvector.PgVectorStore and related documentation.
I can work around this by explicitly creating the schema via:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE IF NOT EXISTS vector_store (
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
content text,
metadata json,
embedding halfvec(2560)
);
CREATE INDEX ON vector_store USING HNSW (embedding halfvec_l2_ops);
But cannot use the database as the PgVectorStore.PgDistanceType enumeration doesn't provide the appropriate distance type functions for these specific column types.
Ive explicitly modified the code for my case to include the appropriate enum
public static enum PgDistanceType {
HALFVEC_EUCLIDEAN_DISTANCE("<->", "halfvec_l2_ops", "SELECT *, embedding <-> ? AS distance FROM %s WHERE embedding <-> ? < ? %s ORDER BY distance LIMIT ? "),
...
Which allowed me to overcome the issue but perhaps should be added to the enum; perhaps along with other types that may need to be supported here too: