ann-benchmarks icon indicating copy to clipboard operation
ann-benchmarks copied to clipboard

Why is `lastfm-64-dot` called `-dot`?

Open thomasahle opened this issue 2 years ago • 5 comments

From the path http://ann-benchmarks.com/lastfm-64-dot_10_angular.html it seems that this dataset is actually angular. But the name indicates dot-product, which many of the algorithms don't natively support.

thomasahle avatar Apr 27 '23 06:04 thomasahle

yeah this is very confusing – I think it's a mistake. https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/datasets.py#L427 indicates it's angular (cosine) distance too.

Maybe let's remove this dataset from the benchmarks for now.

erikbern avatar Apr 27 '23 13:04 erikbern

@benfred should be able to shed some light on this.

maumueller avatar Apr 27 '23 14:04 maumueller

The original intent was to test out inner-product distance (dot), not angular distance: https://github.com/erikbern/ann-benchmarks/pull/91 .

IIRC, the rationale was that certain algorithms either didn't support IP distance - or didn't have good performance when applying transforms like https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/XboxInnerProduct.pdf to convert IP distance to a cosine space

benfred avatar Apr 27 '23 16:04 benfred

I think it's nice to have a dataset for dot products. But I'll fix that after I'm done with this run.

erikbern avatar Apr 27 '23 18:04 erikbern