Add FixedPointDecimal benchmark.
Opening this PR to discuss merging benchmarks into this repo, so that we can track performance across commits/versions.
I'm not sure if there's a usual structure to follow for putting benchmarks into a julia repo? Do other repos besides the main Julia repo use Nanosoldier?
Also, in its current form, the benchmark compares against raw Int and Float types, but for all the operations except division, those types execute the operation in a single clock tick, so it's almost not worth spending the computation to measure them... So maybe we can simplify this file to just measure FixedDecimals.
Anyway, looking forward to figuring out the best way to do this with you! :)
I think this makes sense. We do have the benchmarks in bench/ on JSON.jl also: https://github.com/JuliaIO/JSON.jl/tree/master/bench, so there's precedent.
It isn't wildly used yet but there is: https://github.com/JuliaCI/PkgBenchmark.jl
@omus that PkgBenchmark seems nice, thanks for the link. (I'm sending a couple PRs now to clean it up for 1.0 so we can tag a version there. 😄)
I guess do you want me to play with setting that up before merging this PR in? I think that seems reasonable
Coverage remained the same at 98.837% when pulling ed2db1770db958a5248b0507ca1df6c52776f074 on NHDaly:bench into 483325ab1f0100937a0d6639bfb9e36ff3597222 on JuliaMath:master.
Okay! I think i've got the benchmarks working via PkgBenchmark.jl, based on https://github.com/JuliaCI/PkgBenchmark.jl/pull/75 going through. (I added a Project.toml pointing at that branch for now, so that we can demo it and see if it makes sense.)
I'll post the results.md file generated here in the next post! :)
There are other things we might want to change, such as:
- Maybe truncating the timings to a minimum like I was doing before, so as to limit noise when judging between commits.
- Or maybe just removing most/all of those types/ops (like, do we really need to be measuring the time for
Int64multiplication? It's just going to be the same every time!). - Reducing the number of iterations -- with
Nset to1, it seems to give pretty consistent results forFixedDecimaltimings, but other things seem to swing more (likeBigInt, and the regular Integers). WithNset to1000it's more consistent, and it reduced the "noise tolerance" from 5% to 1%, but it makes it pretty slow (takes about 8min on my machine, vs ~2min).
Okay, here are the results, generated by running $ julia benchmark/runbench.jl:
Benchmark Report for FixedPointDecimals
Job Properties
- Time of benchmark: 20 Dec 2018 - 13:15
- Package commit: 0dbf53
- Julia commit: d78923
- Julia command flags: None
- Environment variables: None
Results
Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.
| ID | time | GC time | memory | allocations |
|---|---|---|---|---|
["*", " Int32"] |
0.086 ns (1%) | |||
["*", " Int64"] |
0.235 ns (1%) | |||
["*", " Int128"] |
0.285 ns (1%) | |||
["*", "BigFloat"] |
49.889 ns (1%) | 2.421 ns | 112 bytes (1%) | 2 |
["*", "BigInt"] |
261.839 ns (1%) | 62.843 ns | 48 bytes (1%) | 3 |
["*", "FD{ Int32,2}"] |
1.550 ns (1%) | |||
["*", "FD{ Int64,2}"] |
15.891 ns (1%) | |||
["*", "FD{Int128,2}"] |
2.070 μs (1%) | 475.085 ns | 456 bytes (1%) | 24 |
["*", "Float32"] |
0.339 ns (1%) | |||
["*", "Float64"] |
0.215 ns (1%) | |||
["+", " Int32"] |
0.076 ns (1%) | |||
["+", " Int64"] |
0.007 ns (1%) | |||
["+", " Int128"] |
0.180 ns (1%) | |||
["+", "BigFloat"] |
56.617 ns (1%) | 4.689 ns | 112 bytes (1%) | 2 |
["+", "BigInt"] |
258.110 ns (1%) | 70.857 ns | 48 bytes (1%) | 3 |
["+", "FD{ Int32,2}"] |
-0.027 ns (1%) | |||
["+", "FD{ Int64,2}"] |
-0.217 ns (1%) | |||
["+", "FD{Int128,2}"] |
-37.671 ns (1%) | -18.925 ns | ||
["+", "Float32"] |
0.238 ns (1%) | |||
["+", "Float64"] |
0.231 ns (1%) | |||
["/", " Int32"] |
3.837 ns (1%) | |||
["/", " Int64"] |
5.173 ns (1%) | |||
["/", " Int128"] |
13.514 ns (1%) | |||
["/", "BigFloat"] |
142.886 ns (1%) | 2.393 ns | 112 bytes (1%) | 2 |
["/", "BigInt"] |
421.201 ns (1%) | 9.256 ns | 464 bytes (1%) | 10 |
["/", "FD{ Int32,2}"] |
5.627 ns (1%) | |||
["/", "FD{ Int64,2}"] |
20.864 ns (1%) | |||
["/", "FD{Int128,2}"] |
2.093 μs (1%) | 505.571 ns | 456 bytes (1%) | 24 |
["/", "Float32"] |
0.413 ns (1%) | |||
["/", "Float64"] |
0.216 ns (1%) | |||
["div", " Int32"] |
0.000 ns (1%) | |||
["div", " Int64"] |
0.032 ns (1%) | |||
["div", " Int128"] |
0.100 ns (1%) | |||
["div", "BigFloat"] |
117.968 ns (1%) | 2.174 ns | 112 bytes (1%) | 2 |
["div", "BigInt"] |
263.183 ns (1%) | 70.385 ns | 40 bytes (1%) | 2 |
["div", "FD{ Int32,2}"] |
-0.125 ns (1%) | |||
["div", "FD{ Int64,2}"] |
2.543 ns (1%) | |||
["div", "FD{Int128,2}"] |
484.794 ns (1%) | 102.801 ns | 128 bytes (1%) | 7 |
["div", "Float32"] |
2.540 ns (1%) | |||
["div", "Float64"] |
2.390 ns (1%) | |||
["identity", " Int32"] |
0.265 ns (1%) | |||
["identity", " Int64"] |
0.323 ns (1%) | |||
["identity", " Int128"] |
0.525 ns (1%) | |||
["identity", "BigFloat"] |
151.618 ns (1%) | 13.535 ns | 336 bytes (1%) | 6 |
["identity", "BigInt"] |
695.563 ns (1%) | 157.238 ns | 136 bytes (1%) | 8 |
["identity", "FD{ Int32,2}"] |
1.293 ns (1%) | |||
["identity", "FD{ Int64,2}"] |
1.266 ns (1%) | |||
["identity", "FD{Int128,2}"] |
604.572 ns (1%) | 137.165 ns | 128 bytes (1%) | 7 |
["identity", "Float32"] |
0.978 ns (1%) | |||
["identity", "Float64"] |
1.043 ns (1%) |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["*"]["+"]["/"]["div"]["identity"]
Julia versioninfo
Julia Version 1.0.2
Commit d789231e99* (2018-11-08 20:11 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.2.0)
uname: Darwin 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64 i386
CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz:
speed user nice sys idle irq
#1-12 2900 MHz 768878 s 0 s 389078 s 9840166 s 0 s
Memory: 32.0 GB (3027.5859375 MB free)
Uptime: 269534.0 sec
Load Avg: 3.58056640625 3.8349609375 3.80224609375
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Codecov Report
Merging #42 into master will not change coverage. The diff coverage is
n/a.
@@ Coverage Diff @@
## master #42 +/- ##
=======================================
Coverage 98.83% 98.83%
=======================================
Files 1 1
Lines 172 172
=======================================
Hits 170 170
Misses 2 2
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 483325a...ed2db17. Read the comment docs.
So I just want to leave a status update here.
So I think this basically works. The benchmarks run (and after the merged-changes in https://github.com/JuliaCI/PkgBenchmark.jl/pull/75, they should correctly and precisely be measuring only the time for each operation (not copying the value, reading from an array, etc)).
The remaining blocker to merging is that it's extremely variable, so much so that I don't think it's useful. Even when running on a single computer, comparing the a single commit against itself, PkgBenchmark consistently reports statistically significant variance. If anyone has any advice about how to diagnose this, that would be much appreciated!
I've tried several things trying to pinpoint the source of the variance, but haven't had any luck:
- I tried simplifying the benchmarks to just be
@benchmarkable $op($x, $x), but saw the same variance there. - I tried statically compiling a sysimg containing FixedPointDecimals, and using that for running the benchmarks, which didn't help.
- One of my coworkers tried disabling the GC (i'm not sure what steps they took), but said that didn't help either.
Does anyone have any other ideas? Without this, this seems not very useful. Sometimes the swings are as large as 100% or 200%, so I'm not sure we'd get meaningful feedback on PRs.