noisy benchmarks
I regularly see 10% swings in performance. Typically if I run the benches 3 times, I'll see a spread of at least 10%.
Is anyone else seeing this? I haven't yet figured out why, but it renders the benchmarks nearly useless if you can't safely measure a 10% change in behavior.
P.S. I noticed this while trying to evaluate #37
Addendum:
bench encode: 1000 coordinates
time: [55.068 µs 55.138 µs 55.227 µs]
change: [-0.7790% -0.4607% -0.1246%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
8 (8.00%) high mild
1 (1.00%) high severe
bench decode: 21502 coordinates
time: [180.43 µs 181.44 µs 182.45 µs]
change: [+0.5481% +1.3255% +2.1812%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
bench polyline6 decoding
time: [40.294 ns 40.948 ns 41.776 ns]
change: [+1.1613% +1.8216% +2.7833%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
bench HUGE polyline6 decoding
time: [25.830 µs 26.154 µs 26.484 µs]
change: [+9.1293% +12.938% +16.944%] (p = 0.00 < 0.05)
Performance has regressed.
The issue is mostly in the bench HUGE polyline6 decoding bench. Less so in the bench polyline6 decoding, and the other two benches actually seem acceptably stable.
We're interning the string, so we shouldn't be subject to any i/o fluctuations. It's a rather large (24kb) string, so maybe there's some edge case we're slamming against there.
And if there is, I wonder if it's some problem in our own handling, that we can fix up, or if it's something out of our hands, in which case I'd be in favor of shrinking the big file down until it behaves consistently.
I gave up on figuring out why they are so noisy and just simplified them a bit in: https://github.com/georust/polyline/issues/42
@mattiZed - as a sanity check, are you also seeing medium sized (+/- 10%) swings in the bench HUGE polyline6 decoding bench?