What does the function prune2() do?

Open zhaokang0826 opened this issue 7 years ago • 1 comments

From the wiki, I know the function prune2() is to improve online search speed. But when I read the code, I do not know what happened, and how does the function improve online search speed? Can you give more details?

Jun 21 '18 11:06 zhaokang0826

I don't recommend work upon prune2 or trying to understand it as the code hasn't been touched or used for long.

The idea is to speedup searching process by making the graph smaller. Recall that during search, when we arrive upon a point (graph node), we visit all it's neighbors. So the less neighbors we have the faster we become (and most times also less accurate). The original graph is a regular one, in that all nodes have the same number (M) of neighbors. Because the local geometry is very different from point to point in a high-dimensional space, it's not hard to imagine all those M neighbors are not equally useful in all cases. Prune2 tries to exploit this idea, and remove the neighbors that are less useful in generating better k-NN approximations.

More specifically, prune2 tries to compute a subgraph G' of the orignal graph G of the same nodes, such that immediate neighbors in G are separated by at most 1 extra step in G'. In other words, if B is A's neighbor in G, then either B is A's neighbor in G' or B is A's neighbor C's neighbor in G'.

Say we are at A, and say if B is a close neighbor to our query that would make an improvement of the current results. With graph G, B is found without any problem. We have reason to believe that because both B and A (by the fact that it has been visited at all) are both near neighbors to the query, their common neighbor C might well be a near neighbor to the query, too, and is likely to have been visited or will be visited in the future. Once C is reached, B will be reached, too, and we don't miss any information by pruning B.

Jun 22 '18 21:06 aaalgo