dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Support vector index on multiple columns

Open nicktobey opened this issue 11 months ago • 0 comments

Currently our vector index workflow is based on MariaDB, which defines vector indexes on a single column that contains the entire vector. We currently support vector indexes on JSON columns and plan to add support for a dedicated VECTOR type.

However, some vector datasets on HuggingFace take a different approach and store each dimension of the vector in a different float column. Importing these datasets into Dolt in a way that makes them indexible requires extra steps to convert the representation of the data.

But there's no reason why we shouldn't be able to support vector indexes on multiple float columns. For example:

CREATE TABLE test(pk int primary key, v0 float, v1 float, v2 float, v3 float);
CREATE INDEX vidx on TEST(v0, v1, v2, v3);

nicktobey avatar Dec 10 '24 22:12 nicktobey