rcpphnsw icon indicating copy to clipboard operation
rcpphnsw copied to clipboard

Can't load a "euclidean" index from `hnsw_build`

Open jlmelville opened this issue 4 months ago • 0 comments

Build a Euclidean index via hnsw_build:

irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism, distance = "euclidean")
iris_nn <- hnsw_search(irism, ann, k = 5)
head(iris_nn$dist)
     [,1]      [,2]      [,3]      [,4]      [,5]
[1,]    0 0.1000000 0.1414212 0.1414212 0.1414213
[2,]    0 0.1414213 0.1414213 0.1414213 0.1732050
[3,]    0 0.1414213 0.2449490 0.2645751 0.2645753
[4,]    0 0.1414215 0.1732051 0.2236071 0.2449490
[5,]    0 0.1414212 0.1414213 0.1732050 0.1732050
[6,]    0 0.3316623 0.3464102 0.3605552 0.3741659

So far so good. Now save it:

ann$save("iris.hnsw")

The class of ann is:

class(ann)
[1] "Rcpp_HnswL2"
attr(,"package")
[1] "RcppHNSW"

so we should be able to load it with:

ann2 <- methods::new(RcppHNSW::HnswL2, 4, "iris.hnsw")

Now search again:

iris_nn2 <- hnsw_search(irism, ann2, k = 5)
head(iris_nn2$dist)
     [,1]       [,2]       [,3]       [,4]       [,5]
[1,]    0 0.01000000 0.01999996 0.01999996 0.01999998
[2,]    0 0.01999998 0.01999998 0.01999998 0.02999999
[3,]    0 0.01999998 0.06000003 0.07000001 0.07000010
[4,]    0 0.02000003 0.03000002 0.05000012 0.06000003
[5,]    0 0.01999996 0.01999998 0.02999996 0.02999999
[6,]    0 0.10999985 0.12000003 0.13000003 0.14000012

This is just the L2 distances (as the class name suggests).

So after saving and reloading a formerly Euclidean index, you must manually convert from L2 distances.

Fix for this will probably be to introduce a dedicated RcppHNSW::HnswEuclidean class which will do the square-rooting for you inside a method. This will be returned from hnsw_build when distance = "euclidean".

jlmelville avatar Mar 11 '24 00:03 jlmelville