rcpphnsw
rcpphnsw copied to clipboard
Can't load a "euclidean" index from `hnsw_build`
Build a Euclidean index via hnsw_build
:
irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism, distance = "euclidean")
iris_nn <- hnsw_search(irism, ann, k = 5)
head(iris_nn$dist)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0.1000000 0.1414212 0.1414212 0.1414213
[2,] 0 0.1414213 0.1414213 0.1414213 0.1732050
[3,] 0 0.1414213 0.2449490 0.2645751 0.2645753
[4,] 0 0.1414215 0.1732051 0.2236071 0.2449490
[5,] 0 0.1414212 0.1414213 0.1732050 0.1732050
[6,] 0 0.3316623 0.3464102 0.3605552 0.3741659
So far so good. Now save it:
ann$save("iris.hnsw")
The class of ann
is:
class(ann)
[1] "Rcpp_HnswL2"
attr(,"package")
[1] "RcppHNSW"
so we should be able to load it with:
ann2 <- methods::new(RcppHNSW::HnswL2, 4, "iris.hnsw")
Now search again:
iris_nn2 <- hnsw_search(irism, ann2, k = 5)
head(iris_nn2$dist)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0.01000000 0.01999996 0.01999996 0.01999998
[2,] 0 0.01999998 0.01999998 0.01999998 0.02999999
[3,] 0 0.01999998 0.06000003 0.07000001 0.07000010
[4,] 0 0.02000003 0.03000002 0.05000012 0.06000003
[5,] 0 0.01999996 0.01999998 0.02999996 0.02999999
[6,] 0 0.10999985 0.12000003 0.13000003 0.14000012
This is just the L2 distances (as the class name suggests).
So after saving and reloading a formerly Euclidean index, you must manually convert from L2 distances.
Fix for this will probably be to introduce a dedicated RcppHNSW::HnswEuclidean
class which will do the square-rooting for you inside a method. This will be returned from hnsw_build
when distance = "euclidean"
.