ann-benchmarks icon indicating copy to clipboard operation
ann-benchmarks copied to clipboard

Support 'logit' scale for x-axis

Open thomasahle opened this issue 4 years ago • 20 comments

The matplotlib logit scale is very nice when plotting values between 0 and 1. In particular, this fixes the current problem of plots being incomprehensible in the "high recall" regime.

I suggest changing the --x-log and --y-log parameters from plot.py to --x-scale and --y-scale taking parameters for plugging directly into matplotlib. This will also support the symlog scale.

I have a commit with this change here: https://github.com/thomasahle/ann-benchmarks/commit/3bf07f41715a8cfe64b4c7c7bfdaa46027b2892b

thomasahle avatar Jan 10 '21 12:01 thomasahle

But will that work when the accuracy is exactly 1 or 0? Won't those be projected to +-infinity?

Do you have a screenshot you can show?

erikbern avatar Jan 11 '21 22:01 erikbern

I haven't run all the algos, but I do think the logit plot makes things easier to read: glove-100-angular In particular, I think for a plot like http://ann-benchmarks.com/glove-25-angular_10_angular.html that is currently incomprehensible.

You are right that the brute force algorithm with its recall=1 is a problem... Perhaps one could make a custom xscale that adds 0 and 1 to logit by surgery...

thomasahle avatar Jan 11 '21 23:01 thomasahle

I think something that warps [0, 1] using roughly x^10 might also be OK. That would project 0.9 to 0.35 and 0.99 to 0.9

Not sure how hard it is to implement a custom scale. I looked at it a while ago and it seemed like a fair amount of boiler plate

erikbern avatar Jan 12 '21 15:01 erikbern

It turns out to be pretty easy in matplotlib 3.3 - and upgrading doesn't break anything in ann-benchmarks.

alpha = 3
ax.set_xscale('function', functions=(
    (lambda x: 1-(1-x)**(1/alpha)),
    (lambda x: 1-(1-x)**alpha)))

This produces the following output:

glove-100-angular-logit

Varying alpha can "stretch" the high-recall regime more or less.

It might be nice to fix the ticks as well, so there isn't a big blank space.

thomasahle avatar Jan 13 '21 17:01 thomasahle

This is the standard linear scale for comparison.

glove-100-angular

thomasahle avatar Jan 13 '21 22:01 thomasahle

Nice! I think we might want to throw in minor gridlines, but looks pretty good!!

erikbern avatar Jan 13 '21 22:01 erikbern

This is alpha=2.

With minor grid lines:

glove-100-angular-logit

With evenly spaced major gridlines:

glove-100-angular-logit

(not sure the minor gridlines make sense here, since you can't guess what values they represent.)

Are your result hdf5 files available for download? Then I could check what looks better for the full plot (without having to run days of benchmarks myself :))

thomasahle avatar Jan 14 '21 12:01 thomasahle

Hi Thomas

You can access the groundtruth for glove-100-angular via aws s3 cp --recursive s3://ann-benchmarks.com/results/2020-07-13/glove-100-angular/ ..

I would also like to see it with a bit more algorithms. I honestly don't know how important it is to put that much focus on the high recall regime. I think the linear-scale is still easier to interpret, but I'm clearly biased after having looked at these plots for years :-)

maumueller avatar Jan 14 '21 12:01 maumueller

Hi Martin, I get fatal error: Unable to locate credentials.

thomasahle avatar Jan 14 '21 14:01 thomasahle

I see, I hope this will work: http://ann-benchmarks.com/results/glove-100-angular.zip

maumueller avatar Jan 14 '21 15:01 maumueller

It somehow doesn't find those files. It seems to be because they don't have the suffix hdf5. Maybe my ann-benchmarks version is too new? E.g. in the glove-100-angular.zip zip file I find glove-100-angular/10/puffinn/angular_1073741824_fht_crosspolytope_0_1, but when I run ann-benchmarks myself, it generates files like glove-100-angular-mine/10/scann/1000_0_2_1_1_10.hdf5.

thomasahle avatar Jan 14 '21 16:01 thomasahle

Oh, right. That was a recent change. Just rename all of them to contain the suffix.

maumueller avatar Jan 14 '21 18:01 maumueller

Alright, here you go. Certainly a bit more uniform in terms of where "the action" is in the plot. If you upload the data, I can test glove-25 too, on which the effect should be even stronger.

glove-100-angular-logit

thomasahle avatar Jan 14 '21 20:01 thomasahle

I can do that tomorrow (or Erik might be able to tell you how to get s3 to work ;-))

maumueller avatar Jan 14 '21 21:01 maumueller

Alright, here is alpha 3, 2, 1 (same as linear) and logit for glove-25.

glove-25-angular_a3 glove-25-angular_a2 glove-25-angular_alpha=1 glove-25-angular_logit

The plots make it clear that there is a lot of difference between the algorithms in the high recall regime. I probably prefer alpha=3 here, but alpha=2 may be the best compromise for all the datasets. One could also consider having it as a user-controlled option in the interactive plots. For now I'm just suggesting adding the option to the plot.py script.

thomasahle avatar Jan 15 '21 16:01 thomasahle

I also tried doing alpha=4 with axis labels looking more like the logit plot:

glove-25-angular

thomasahle avatar Jan 15 '21 17:01 thomasahle

Nice, I think this looks really great. Happy to merge!

erikbern avatar Jan 15 '21 21:01 erikbern

Any chance of getting this actually on annbenchmarks.com? :-)

thomasahle avatar Nov 21 '22 19:11 thomasahle

Sure, please update create_website.py to use these plots as well.

maumueller avatar Nov 21 '22 20:11 maumueller

Would be great to have good defaults in the website create script.

I guess it's time to rerun the benchmarks soon!

erikbern avatar Nov 25 '22 03:11 erikbern