ann-benchmarks icon indicating copy to clipboard operation
ann-benchmarks copied to clipboard

knn-graph construction benchmark

Open frankier opened this issue 3 years ago • 15 comments

Firstly, thank you very much for creating this benchmark. It is very helpful for navigating this exciting but somewhat overwhelming field.

I am interested in particular in the performance of constructing a knn-graph from data points. I wonder whether this would be an interesting additional benchmark. I believe the results might differ somewhat from the existing results, since:

  1. It would combine query performance and index construction in one measure.
  2. It would put techniques which create such a graph as part of their construction (like NN-descent) at an advantage.

What do you think?

frankier avatar Dec 27 '20 10:12 frankier

What measure do you suggest? You could edit https://github.com/erikbern/ann-benchmarks/blob/master/ann_benchmarks/plotting/plot_variants.py to include different measures that relate build time and qps.

Note that the interactive website(e.g., http://ann-benchmarks.com/glove-100-angular_10_angular.html) might contain plots that are of interest here. One of them relates the recall to the index size/qps, relating size and query performance to each other. It would be easy to do that with build time as well, but it's a bit difficult to interpret.

maumueller avatar Dec 28 '20 10:12 maumueller

You could sort of infer it by adding the index build time and the extrapolated query time for the full dataset. In a way I like this metric because it encapsulates everything (whereas currently we ignore the index building time in the benchmarks, as long as the index completes within 1h).

erikbern avatar Dec 28 '20 20:12 erikbern

It's true that the (extrapolated) KNN query time is very general and probably the most useful benchmark for most people, however KNN graph construction is also quite general e.g. it's the recommended way to do unsupervised learning when there's many high dimensional data points in scikit-learn. This includes including clustering and manifold learning. Many high dimensional points means basically anything which the build in BallTree method doesn't scale to -- which is quite a lot of things. Here's some background:

  • http://tomdlt.github.io/decks/2018_pydata/#43
  • https://scikit-learn.org/stable/auto_examples/neighbors/approximate_nearest_neighbors.html
  • https://github.com/scikit-learn/scikit-learn/issues/10463
  • https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-transformer

I don't think it's simply index construction time + NN query time x index size gives the knn graph construction time for some systems construction of some index amounts to constructing a knn graph, so this would penalise these systems. See for example pynndescent: https://github.com/lmcinnes/pynndescent/blob/master/pynndescent/pynndescent_.py#L1963 where the fit_transform(...) method shortcuts querying and just copies the knn graph out of the index. It does provide an upper bound however.

It is a bit annoying since it's becomes more difficult to show O(f(n))-type trends for varying index size. I would propose to ignore this issue and just use the existing data sets which already show some kind of spread of performances.

frankier avatar Jan 08 '21 07:01 frankier

Absolutely agree with @frankier that this would be a great additional feature. I am working on manifold learning / clustering, and what I am interested in is exactly the kNN graph construction time.

It can probably be approximately computed as "index build time + the extrapolated query time for the full dataset" as @erikbern suggested above, however for some algorithms such as PyNNDescent (@lmcinnes) one should not add the extrapolated query time, and simply take the index build time alone.

If we had a list of algorithms for which this applies, then it would be very easy to add the require plots (as @maumueller wrote).

dkobak avatar Jul 11 '22 12:07 dkobak

Thanks for bringing this up again @dkobak. I think I'm understanding @frankier problem description a bit better now. Isn't it exactly as @erikbern recommends the index building time + the (extrapolated) query time if the dataset would be the query set? This would be straight-forward to implement from the existing data and doesn't even require re-running experiments.

None of the graph-based algorithms build the actual knn graph AFAIK, so they same rule should apply for them as well.

maumueller avatar Jul 11 '22 12:07 maumueller

As said, PyNNDescent does build the knn graph, but it might be that most of the others don't. It would be nice to know whether to use PyNNDescent or another option in this case though.

It could also be that some have cheaper queries when some batching is applied or similar so that building the knn graph ends up cheaper.

frankier avatar Jul 11 '22 12:07 frankier

I would be very surprised if that statement about PyNNDescent is true, @frankier. @lmcinnes: can you confirm it? In my world, you don't get a well-performing graph-based search if you don't prune neighbors.

maumueller avatar Jul 11 '22 12:07 maumueller

I can confirm that PyNNDescent builds an actual kNN graph during build time. If you build it with k neighbors and later only want to have k (or fewer) neighbors, then you don't need to query() at all. That's how PyNNDescent is used in openTSNE (which I am more familiar with): https://github.com/pavlin-policar/openTSNE/blob/master/openTSNE/nearest_neighbors.py#L520 and also in UMAP.

It seems that the way PyNNDescent is run in ann-benchmarks is to use query() all the time. But you wouldn't actually do it in practice if you want to build a kNN graph (at least openTSNE/UMAP do not).

That said, I am not sure how exactly PyNNDescent is run here to obtain different recall values.

Would be great if @lmcinnes could clarify.

dkobak avatar Jul 11 '22 13:07 dkobak

PyNNDescent was built with knn-graph construction in mind as a use-case. That means that the index construction is actually split into two parts, with the second part optional. The first phase builds an (approximate) knn-graph. You can stop at that point and simply use that if the knn-graph was the intended result. The second phase (induced by a call to prepare, or a query on an unprepared index) does a bunch of graph pruning and optimization (as well as JIT compiling the actual query function). So yes, all queries run on a pruned version of the graph. On the other hand the full graph is available earlier if desired.

In the current ann-benchmarks set up I have pynndescent calling prepare during index construction (since the benchmarks are query oriented, and it is sensible to ensure this phase is included in the index construction time if query performance is your goal). If, however, you are simply interested in knn-graph construction then you can get that from pynndescent in less time than even the index construction time (since the prepare phase isn't required, but is a non-trivial part of the index construction time).

lmcinnes avatar Jul 11 '22 18:07 lmcinnes

Thanks Leland, that's very helpful. Do you know if any other algorithms currently included in ann-benchmarks construct kNN graph when building the index?

dkobak avatar Jul 13 '22 12:07 dkobak

I know that kgraph does. My understanding is that the various NGT algorithms do (and cache it). The bottom layer of an HNSW index is an approximation to one, and with suitably chosen parameters (to avoid over-pruning -- assuming those parameters are exposed by the implementation) I believe you might be able to induce a reasonable knn-graph from it. I would have to check the details on implementation, but at least in principle vamana (based on NSG) should be able to produce a knn at index-build time, again by skipping or tuning away the pruning.

I think if you wanted to create a different benchmark for knn-graph construction (and measure accuracy thereof) you could probably put together and tune a number of algorithms that could do it significantly faster than index build + query time. It is definitely a non-trivial amount of work to get that set up and the algorithms tuned of course.

lmcinnes avatar Jul 13 '22 16:07 lmcinnes

Thanks again. All good points.

dkobak avatar Jul 14 '22 11:07 dkobak

Any updates on this? I am deciding between nndescent and hnsw for KNN graph construction, and it would be helpful if we had benches for the actual graph construction. Anybody else has any personal experience or benches would also be helpful.

vikigenius avatar May 17 '23 18:05 vikigenius

So the Recall/Build time graph on https://ann-benchmarks.com/glove-100-angular_10_angular.html says that index building on faiss-ivf and bruteforce-blas is many orders of magnitude faster than the others? Or is there more to it?

flying-sheep avatar Jun 22 '23 12:06 flying-sheep

In general, faiss/scann have faster building times, but the plots are a bit misleading: An implementation would score well even though queries run extremely slow because this is not taken into account in this comparison.

I'm not sure how to plot these three parameters (recall/throughput/build time) in a good way.

maumueller avatar Jun 23 '23 07:06 maumueller