annoy annoy parameters consideration for getting the best search match

annoy parameters consideration for getting the best search match

Open apalvanov opened this issue 4 years ago • 0 comments

Hello,

In our use-case - we are testing Annoy in different sizes of index – sizes are between 1K vectors to 2M . we are using 2 methods:

a.build(n_trees)
a.get_nns_by_vector(v, n, search_k, include_distances)

We did some tests while setting different params:

n_trees = [50, 150] k_search = [-1(default), 5000, 15000, 25000, 50000] (when k_search is constant, then the 'approximate nearest neighbours' is set to 100) n (approximate nearest neighbours) = [100, 130, 150, .. 200, ..., 400, ...] (in this scenario 'k_search' was set to default, and was influenced by 'n')

We ran few tests while permutating the params.

We saw that sometimes the expected result was not one of the values that we get. Meaning – there was a better high match result that we didn’t get.

For example, the docs say that the higher n_trees when building the index, the merrier (let's assume I have enough disk and memory), but in reality it actually decreased the accuracy of the results we expected.

In addition, the bigger 'n' and/or 'k_search' values we provided, the better results we received.

How can we ensure or at least raise the chances that the expected result will be returned? What is the parameters consideration we should take? Is it derived from the size of the index? I guess always raising the k_search to be as high as possible is not the correct solution as index size change and the 'query' vector change (The lower the accuracy, the higher k_search/n we need to use ... )

Thanks a lot

Mar 25 '20 19:03 apalvanov

bump

Sep 07 '22 07:09 EY4L

My suggestion is to set n_trees as high as you can where you can afford the build time and the index still fits in RAM

Then set search_k as a tradeoff between recall and query time – higher search_k will improve recall, at the cost of longer search times.

Sep 07 '22 16:09 erikbern

annoy annoy copied to clipboard

annoy parameters consideration for getting the best search match

annoy
annoy copied to clipboard