Vector search support
This pull request adds initial support for vector search in libSQL.
Highlights
- Vector column type for storing vectors in tables.
- Vector index creation that is automatically updated on table updates.
- Exact vector search with metadata filtering using plain SQL.
- Approximate vector search using the new vector_top_k() function that is backed by DiskANN-based vector index.
Usage
Creating a table with a vector column:
CREATE TABLE movies (
title TEXT,
year INT,
embedding FLOAT32(3)
);
Inserting vector data:
INSERT INTO movies (title, year, embedding)
VALUES
(
'Napoleon',
2023,
vector('[1,2,3]')
),
(
'Black Hawk Down',
2001,
vector('[10,11,12]')
),
(
'Gladiator',
2000,
vector('[7,8,9]')
),
(
'Blade Runner',
1982,
vector('[4,5,6]')
);
Creating an index on vector column:
CREATE INDEX movies_idx USING vector_cosine_ops ON movies (embedding);
Finding top-k similar rows (exact):
SELECT title, year FROM movies ORDER BY vector_distance_cos(embedding, '[3,1,2]') LIMIT 3;
Finding top-k similar rows (approximate):
SELECT
title,
year
FROM
vector_top_k('movies_idx', '[4,5,6]', 3)
JOIN
movies
ON
movies.rowid = id;
Limitations
- Index key is always
rowid, primary keys not supported. CREATE INDEXdoes not index rows that already exist in the base table.- Vector index uses 32-bit per vector element, which causes redundant I/O and space amplification.
Any status for this PR? We're hardly waiting for vector support 🙏🏻 Thanks for the work btw!
@pax-k I am actively working with folks to iron out some bugs and then get this merged.
This work has been merged as part of the following PRs:
https://github.com/tursodatabase/libsql/pull/1531
https://github.com/tursodatabase/libsql/pull/1551
https://github.com/tursodatabase/libsql/pull/1557
https://github.com/tursodatabase/libsql/pull/1560
Therefore, closing this.