sqlite-vec icon indicating copy to clipboard operation
sqlite-vec copied to clipboard

"OperationalError: A LIMIT or 'k = ?' constraint is required on vec0 knn queries"

Open mholt opened this issue 1 year ago • 5 comments

Hi, just a quick question that I am not sure is a bug or me "holding it wrong" :)

I have this schema and query:

CREATE VIRTUAL TABLE image_embeddings USING vec0(embedding float[768]);

CREATE TABLE images(
    filename TEXT,
    embedding_id INTEGER,
    FOREIGN KEY ("embedding_id") REFERENCES "image_embeddings"("rowid") ON UPDATE CASCADE ON DELETE CASCADE
);

-- ... a bunch of inserts...

SELECT
    images.filename,
    image_embeddings.distance
FROM images
JOIN image_embeddings ON image_embeddings.rowid = images.embedding_id
WHERE image_embeddings.embedding MATCH ?
ORDER BY image_embeddings.distance
LIMIT 3

I get:

"OperationalError: A LIMIT or 'k = ?' constraint is required on vec0 knn queries"

But I do have a LIMIT clause.

If I add AND k=3 to my WHERE clause, it works. But I think it should work with a LIMIT clause?

Bonus question: I actually wanted to use LEFT JOIN(instead of JOIN) as in the example code, but when I use LEFT JOIN, I get: unable to use function MATCH in the requested context. Any idea why?

mholt avatar Oct 08 '24 16:10 mholt

Hey @mholt , big fan of your work!

To get around that limit/k error message, try a k = 3 constraint instead:

SELECT *
FROM image_embeddings
WHERE embedding MATCH ?
  AND k = 3

The LIMIT 3 syntax only works in SQLite version 3.41 or above, so I usually just recommend the k = 3 syntax instead (which works everywhere).

For the other issue: Try selecting from image_embeddings and left joining images like so:

SELECT
    images.filename,
    image_embeddings.distance
FROM image_embeddings
LEFT JOIN images ON images.embedding_id = image_embeddings.rowid 
WHERE image_embeddings.embedding MATCH ?
  AND image_embeddings.k = 3

Sometimes that may not be enough, the SQLite query planner gets weird with virutal tables. So a separate CTE step may be necessary:

WITH vec_matches AS (
  SELECT
    *
   -- This row_number() expression is sometimes needed if you see query planner errors
   --, row_number() OVER (ORDER BY distance) AS rank
  FROM image_embeddings
  WHERE image_embeddings.embedding MATCH ?
    AND image_embeddings.k = 3
)
SELECT 
  images.filename, 
  vec_matches.distances
FROM vec_matches
LEFT JOIN images ON images.embedding_id = vec_matches.rowid

The reason why the LEFT JOIN wasn't working on the original query was because the FROM clause was referencing the images table, meaning it would go through every row in images and LEFT JOIN it with a KNN query on image_embeddings. Instead you can directly do a single KNN query on image_embeddings, then LEFT JOIN those results with images to get back the metadata.

Let me know if that works out! Sometimes the JOIN queries can get a little tricky.

asg017 avatar Oct 08 '24 17:10 asg017

big fan of your work!

Big fan of MY work? I'm a big fan of YOUR work! :smile:

The LIMIT 3 syntax only works in SQLite version 3.41 or above, so I usually just recommend the k = 3 syntax instead (which works everywhere).

Interesting! SELECT sqlite_version() reports 3.46.0... so, maybe I just have a funky query? (I'll use k= as a workaround for now, but I found it surprising.)

The reason why the LEFT JOIN wasn't working on the original query was because...

Ahh.. that makes sense. And works! Thank you!

And actually... if I select from the image_embeddings table, I don't think I need a left join at all, since they should all be referenced by an image. So, maybe that's just better overall :+1:

mholt avatar Oct 08 '24 17:10 mholt

I experience the same issue with sqlite version 3.49.1 and the following structure/queries:

CREATE VIRTUAL TABLE face_embeddings USING vec0(
    embedding float[128],
    id INTEGER PRIMARY KEY,
    media_id INTEGER,
    cluster_id INTEGER DEFAULT 0,
    dirty BOOLEAN DEFAULT TRUE,
);

CREATE TABLE IF NOT EXISTS face_cluster_exclusions (
    face_id INTEGER NOT NULL,
    cluster_id INTEGER NOT NULL,
    PRIMARY KEY (face_id, cluster_id)
) WITHOUT ROWID;
SELECT embedding, id, cluster_id, distance, core FROM face_embeddings
WHERE embedding MATCH ? 
AND cluster_id NOT IN  (
    SELECT cluster_id
    FROM face_cluster_exclusions exc
    WHERE exc.face_id = ?
)
AND id != ?
AND k = ?
ORDER BY distance;

this works fine, but doesn't always return k results (even if there would be k entries with the given criteria), because apparently the k-cutoff is evaluated before the other clauses.

If I put the query like this:

SELECT embedding, id, cluster_id, distance, core FROM face_embeddings
WHERE embedding MATCH ? 
AND cluster_id NOT IN  (
    SELECT cluster_id
    FROM face_cluster_exclusions exc
    WHERE exc.face_id = ?
)
AND id != ?
ORDER BY distance
LIMIT ?;

I get "A LIMIT or 'k = ?' constraint is required on vec0 knn queries"

simmac avatar Aug 18 '25 12:08 simmac

Unfortunately the NOT IN (...) operator won't work — SQLite virtual tables can detect and override IN (...) operators, but not NOT IN (...).

I asked about support for this in the SQLite forum, but I'm unsure if it'll ever be supported.

I'll try to think of a few good workaround that'll fit your use-case

asg017 avatar Aug 18 '25 15:08 asg017

I see, thank you! My current workaround is to manually re-query with a progressively (exponentially) higher k until I either get k results or a row that's over my distance threshold (it would be more clean to all have this in the query of course, but it works for now)

simmac avatar Aug 18 '25 18:08 simmac