BenchmarkTools.jl icon indicating copy to clipboard operation
BenchmarkTools.jl copied to clipboard

use legitimate non-iid hypothesis testing

Open jrevels opened this issue 6 years ago • 0 comments

It's unlikely I'll get around to doing this in the foreseeable future, but I'm tired of digging through issues to find this comment when I want to link it in other discussions. Recreated from my comment here:

Robust hypothesis testing is quite tricky to do correctly in the realm of non-i.i.d. statistics, which is the world benchmark timings generally live in. If you do the "usual calculations", you'll end up getting junk results a lot of the time.

A while ago, I developed a working prototype of a subsampling method for calculating p-values (which could be modified to compute confidence intervals), but it relies on getting the correct normalization coefficient for the test statistic + timing distribution (unique to each benchmark). IIRC, it worked decently on my test benchmark data, but only if I manually tuned the normalization coefficient for any given benchmark. There are methods out there for automatically estimating this coefficient, but I never got around to implementing them. For a reference, see Politis and Romano's book "Subsampling" (specifically section 8: "Subsampling with Unknown Convergence Rate").

jrevels avatar Sep 05 '17 15:09 jrevels