Arraymancer icon indicating copy to clipboard operation
Arraymancer copied to clipboard

Any plans on adding Cosine similartity to the list of metrics?

Open greenersharp opened this issue 1 year ago • 0 comments

I'm learning and experimenting with using Arraymancer and text embedding.

In python I use SentenceTransformers and Sklearn/KNeighborsClassifier to find closest matches, using the Cosine metric.

It seems like Arraymancer doesn't support Cosine metric. Are there plans on adding it? I was using kdTree, with euclidean metric and the results were all wrong.

Can Arraymancer help me normalize the text embeddings? this way I can use euclidean metric and get some good results?

here is my code:

import arraymancer

let vectors = read_npy[float64]("title_vectors.txt.npy")

echo vectors.shape
# [1226242, 350]

let kd = kdtree(vectors)
let (dist,ix) =  kd.query(vectors[0,_].reshape(350), k = 3 )  # find closest to first entry

Another thing I am confused about, is why I need to reshape(350) When I tried: let (dist,ix) = kd.query(vectors[0,_], k = 3 ) it resulted in: Broadcasting error: non-singleton dimensions must be the same in both tensors.

Thanks

greenersharp avatar Nov 05 '24 14:11 greenersharp