sqlite-vec icon indicating copy to clipboard operation
sqlite-vec copied to clipboard

`vec_static_blob_entries` should support other distance metrics

Open asg017 opened this issue 1 year ago • 5 comments

Only uses L2 distance for now

asg017 avatar Jul 30 '24 15:07 asg017

@asg017 thanks for this package. Just dropping by to give a vote in favor of implementing this.

if I understand correctly, this is what handles the distance in a WHERE vector MATCH '[...]' LIMIT k query. If so, I would find it useful to be able to use other distance metrics here. I find cosine similarity produces better results for my data, and also have existing code relying on cosine similarity thresholds, so it would be great to be able to use it here.

mfonda avatar Aug 30 '24 18:08 mfonda

@mfonda are you using the vec_static_blob_entries table, or vec0 tables?

The "static blob" approach is experimental and undocumented, so if you're using vec0 tables, you can specify which distance metric to use like so:

create virtual table vec_items using vec0(
  text_embedding float[128] distance_metric=cosine
);

Only l1, l2, and cosine are currently supported. The default is l2

asg017 avatar Aug 30 '24 19:08 asg017

You can also always use the vec_distance_l1(), vec_distance_l2(), and vec_distance_cosine()scalar SQL functions manually, but theMATCH` operation on vec0 virtual tables will be faster

asg017 avatar Aug 30 '24 19:08 asg017

@asg017 I'm using vec0 tables. Your code example using distance_metric=cosine is exactly what I was looking for.

Thank you very much for the info. Sorry for the noise here--I'm not very familiar with c or sqlite internals, and I was attempting to read the code and figure out if there was a way to use cosine as the distant metric. Seems I was looking in the wrong place.

mfonda avatar Aug 30 '24 19:08 mfonda

no problem, i should have this documented better!

asg017 avatar Aug 30 '24 19:08 asg017