Bogus "no statistical difference" report when all the times are the same
old.txt:
BenchmarkFloatSub/100-4 20000000 115 ns/op 64 B/op 1 allocs/op
BenchmarkFloatSub/100-4 20000000 114 ns/op 64 B/op 1 allocs/op
BenchmarkFloatSub/100-4 20000000 115 ns/op 64 B/op 1 allocs/op
BenchmarkFloatSub/100-4 20000000 115 ns/op 64 B/op 1 allocs/op
BenchmarkFloatSub/100-4 20000000 115 ns/op 64 B/op 1 allocs/op
PASS
ok math/big 101.306s
new.txt (note that all the times are the same: 78.8 ns/op):
BenchmarkFloatSub/100-4 20000000 78.8 ns/op 0 B/op 0 allocs/op
BenchmarkFloatSub/100-4 20000000 78.8 ns/op 0 B/op 0 allocs/op
BenchmarkFloatSub/100-4 20000000 78.8 ns/op 0 B/op 0 allocs/op
BenchmarkFloatSub/100-4 20000000 78.8 ns/op 0 B/op 0 allocs/op
BenchmarkFloatSub/100-4 20000000 78.8 ns/op 0 B/op 0 allocs/op
PASS
ok math/big 88.135s
benchstat old.txt new.txt gives
name old time/op new time/op delta
FloatSub/100-4 115ns ± 0% 79ns ± 0% ~ (p=0.079 n=4+5)
i.e. reports "no statistically significant improvement", which is clearly wrong.
I've also ran into this, often getting 6 runs collapsed into 4-5 because a few are identical. This must be done on purpose though, which makes me wonder what the reasoning behind it is.
The Mann-Whitney test (the default) can't be used on samples with zero variance. The code of benchstat checks
if testerr == stats.ErrZeroVariance
on line 190 and reports "zero variance" (which is more useful than the ~) if the condition is true, but looking at the doc of aclements/go-moremath/stats it looks like the MannWhitney function actually returns a ErrSamplesEqual, so benchstat does not catch it and it just reports ~.
I think you need to run with -delta-test=none. That will show the simple delta difference.