Turing.jl icon indicating copy to clipboard operation
Turing.jl copied to clipboard

Update benchmarks on wiki

Open yebai opened this issue 5 years ago • 18 comments

The benchmark numbers on the wiki are seriously out-of-date, and probably misleading about Turing's performance. Better to update the numbers using the current releases.

https://github.com/TuringLang/Turing.jl/wiki

yebai avatar Dec 01 '19 23:12 yebai

@xukai92 are these models available somewhere? Perhaps we can add them to https://github.com/TuringLang/Turing.jl/tree/master/benchmarks

yebai avatar Dec 02 '19 13:12 yebai

Seems that they are avaiable in an old branch here https://github.com/TuringLang/TuringExamples/tree/old-models/old-models.

For the benchmark suite, can we add the Stan version as well?

xukai92 avatar Dec 02 '19 14:12 xukai92

For the benchmark suite, can we add the Stan version as well?

I think so, Github actions are quite generous with build time compared to Travis. So we can run these benchmarks altogether, then produce a table on the fly.

yebai avatar Dec 02 '19 14:12 yebai

Sounds good. I will take a look after finishing the remaining issues for AABI in AHMC.

xukai92 avatar Dec 02 '19 14:12 xukai92

Will be fixed via https://github.com/TuringLang/TuringExamples/pull/22

xukai92 avatar Jan 11 '20 02:01 xukai92

Here is a new table we can use

Model Stan Turing
Gaussian with Unknown Parameters 0.342 +/- 0.015 2.211 +/- 0.061
Hierarchical Poisson 0.134 +/- 0.068 0.325 +/- 0.013
High Dimensional Gaussian 11.609 +/- 0.306 9.766 +/- 0.222
Semi-supervised HMM 5.033 +/- 0.058 463.213 +/- 26.045
LDA 43.888 +/- 0.504 378.762 +/- 7.91
Logistic Regression 56.15 +/- 2.274 3.942 +/- 1.331
Naive Bayes 13.677 +/- 0.142 6.848 +/- 0.144
Stochastic Volatility 0.918 +/- 0.014 75.026 +/- 30.579

What's the best place to host it? Not sure if we still want it on the wiki page.

xukai92 avatar Jan 11 '20 22:01 xukai92

Let's make a nice table, and put it on the front page of turing.ml, with a link the script to reproduce all the numbers.

yebai avatar Jan 11 '20 22:01 yebai

Slightly improved the table. Another other change to make?

xukai92 avatar Jan 11 '20 23:01 xukai92

cc @trappmartin and @cpfiffer, who might have ideas/suggestions regarding how to format and publish this benchmarking result on the front page.

yebai avatar Jan 11 '20 23:01 yebai

here is an example for Julia's benchmarking page: https://julialang.org/benchmarks/

yebai avatar Jan 11 '20 23:01 yebai

Thanks for the pointer. I can make the visualiation. I will also improve the table a bit more - got an idea.

xukai92 avatar Jan 11 '20 23:01 xukai92

Its a bit hard to make the markdown table nice as white spaces would be ignored. Plain text actually looks nice.

PPL                              Turing             Stan
Model
10,000D Gaussian         9.766 ±  0.222  11.609 ±  0.306
Gaussian Unknown         2.211 ±  0.061   0.342 ±  0.015
Hierarchical Poisson     0.325 ±  0.013   0.134 ±  0.068
LDA                    378.762 ±  7.910  43.888 ±  0.504
Logistic Regression      3.942 ±  1.331   56.15 ±  2.274
Naive Bayes              6.848 ±  0.144  13.677 ±  0.142
Semi-Supervised HMM    463.213 ± 26.045   5.033 ±  0.058
Stochastic Volatility   75.026 ± 30.579   0.918 ±  0.014

UPDATES

  • The Turing column comes first now

xukai92 avatar Jan 12 '20 00:01 xukai92

Turing should probably be the first column, and we should order them by which models Turing performs better in.

cpfiffer avatar Jan 12 '20 00:01 cpfiffer

Also made a plot

results

Y-axis is in log scale

xukai92 avatar Jan 12 '20 01:01 xukai92

Maybe you could add error bars of the standard deviation to the plot?

devmotion avatar Jan 12 '20 08:01 devmotion

Would it be possible to add some benchmarks on which we evaluate how Stan and Turing performs with increasing number of observations? Basically a line plot with the number of observations on the x-axis.

trappmartin avatar Jan 12 '20 09:01 trappmartin

Maybe you could add error bars of the standard deviation to the plot?

Sure

Would it be possible to add some benchmarks on which we evaluate how Stan and Turing performs with increasing number of observations? Basically a line plot with the number of observations on the x-axis.

Sure. But let improve those we are slow first. Otherwise it's hard to benchmark them (inference time is too long).

xukai92 avatar Jan 12 '20 16:01 xukai92

I've copied and pasted the current table and figure to the wiki.

xukai92 avatar Feb 09 '20 16:02 xukai92