benchstat Bogus "no statistical difference" report when all the times are the same

old.txt:

BenchmarkFloatSub/100-4         20000000           115 ns/op          64 B/op          1 allocs/op
BenchmarkFloatSub/100-4         20000000           114 ns/op          64 B/op          1 allocs/op
BenchmarkFloatSub/100-4         20000000           115 ns/op          64 B/op          1 allocs/op
BenchmarkFloatSub/100-4         20000000           115 ns/op          64 B/op          1 allocs/op
BenchmarkFloatSub/100-4         20000000           115 ns/op          64 B/op          1 allocs/op
PASS
ok      math/big    101.306s

new.txt (note that all the times are the same: 78.8 ns/op):

BenchmarkFloatSub/100-4         20000000            78.8 ns/op         0 B/op          0 allocs/op
BenchmarkFloatSub/100-4         20000000            78.8 ns/op         0 B/op          0 allocs/op
BenchmarkFloatSub/100-4         20000000            78.8 ns/op         0 B/op          0 allocs/op
BenchmarkFloatSub/100-4         20000000            78.8 ns/op         0 B/op          0 allocs/op
BenchmarkFloatSub/100-4         20000000            78.8 ns/op         0 B/op          0 allocs/op

PASS
ok      math/big    88.135s

benchstat old.txt new.txt gives

name            old time/op    new time/op    delta
FloatSub/100-4     115ns ± 0%      79ns ± 0%      ~     (p=0.079 n=4+5)

i.e. reports "no statistically significant improvement", which is clearly wrong.

Aug 16 '16 12:08 ALTree

I've also ran into this, often getting 6 runs collapsed into 4-5 because a few are identical. This must be done on purpose though, which makes me wonder what the reasoning behind it is.

Sep 07 '16 14:09 mvdan

The Mann-Whitney test (the default) can't be used on samples with zero variance. The code of benchstat checks

if testerr == stats.ErrZeroVariance

on line 190 and reports "zero variance" (which is more useful than the ~) if the condition is true, but looking at the doc of aclements/go-moremath/stats it looks like the MannWhitney function actually returns a ErrSamplesEqual, so benchstat does not catch it and it just reports ~.

Sep 08 '16 19:09 ALTree

I think you need to run with -delta-test=none. That will show the simple delta difference.

Jul 20 '22 10:07 harkishen