Why is the benchmark reporting 10-times higher values?
I understand, it's beneficial (for the sake of accurateness) to run the measured function 10 times in a loop. But, why this is the value which is actually reported? Why not report value divided by 10? It is really confusing!
+1, I think its worth going to 2.0.0 to fix this
cc @johnmccutchan for insight
This is legacy that we should not change. The purpose of this harness is to replicate the exact same benchmark conditions that we use internally. All historical data has this same scaling in place.
If you have a benchmark that runs under this harness and you speed it up (or slow it down), you can see the relative change to your base line.
tl;dr- this isn't a "code timer" but a benchmark runner that is designed to match our internal benchmarking infrastructure.
Internally we could stay on 1.0.4 though right? It seems wrong to force this behavior on all users of this package. Or maybe we could hide it behind an option?
@jakemac53 If the external version changes it makes it impossible for us to compare results to our internal numbers.
I'll discuss what we want to do long term with this package at the next compiler team meeting.
Ok sounds good, my main concern is that this package is currently advertising itself as the officially endorsed package for writing dart benchmarks, but really it seems like its just for internal use if we can't ever make changes which would throw off our historical measurements. Instead it seems like users should lock themselves to a particular version and choose to upgrade when the benefits of the new features outweigh the cost of having to normalize their historical data.
Seems like a flag would work. If the examples and internal benchmarks set the same flag, the results would be comparable, without affecting other benchmarks.