Opening this PR to discuss merging benchmarks into this repo, so that we can track performance across commits/versions.

I'm not sure if there's a usual structure to follow for putting benchmarks into a julia repo? Do other repos besides the main Julia repo use Nanosoldier?

Also, in its current form, the benchmark compares against raw Int and Float types, but for all the operations except division, those types execute the operation in a single clock tick, so it's almost not worth spending the computation to measure them... So maybe we can simplify this file to just measure FixedDecimals.

Anyway, looking forward to figuring out the best way to do this with you! :)

Dec 06 '18 16:12 NHDaly

I think this makes sense. We do have the benchmarks in bench/ on JSON.jl also: https://github.com/JuliaIO/JSON.jl/tree/master/bench, so there's precedent.

Dec 07 '18 21:12 TotalVerb

It isn't wildly used yet but there is: https://github.com/JuliaCI/PkgBenchmark.jl

Dec 08 '18 02:12 omus

@omus that PkgBenchmark seems nice, thanks for the link. (I'm sending a couple PRs now to clean it up for 1.0 so we can tag a version there. 😄)

I guess do you want me to play with setting that up before merging this PR in? I think that seems reasonable

Dec 10 '18 16:12 NHDaly

Coverage remained the same at 98.837% when pulling ed2db1770db958a5248b0507ca1df6c52776f074 on NHDaly:bench into 483325ab1f0100937a0d6639bfb9e36ff3597222 on JuliaMath:master.

Dec 20 '18 17:12 coveralls

Okay! I think i've got the benchmarks working via PkgBenchmark.jl, based on https://github.com/JuliaCI/PkgBenchmark.jl/pull/75 going through. (I added a Project.toml pointing at that branch for now, so that we can demo it and see if it makes sense.)

I'll post the results.md file generated here in the next post! :)

There are other things we might want to change, such as:

Maybe truncating the timings to a minimum like I was doing before, so as to limit noise when judging between commits.
Or maybe just removing most/all of those types/ops (like, do we really need to be measuring the time for Int64 multiplication? It's just going to be the same every time!).
Reducing the number of iterations -- with N set to 1, it seems to give pretty consistent results for FixedDecimal timings, but other things seem to swing more (like BigInt, and the regular Integers). With N set to 1000 it's more consistent, and it reduced the "noise tolerance" from 5% to 1%, but it makes it pretty slow (takes about 8min on my machine, vs ~2min).

Okay, here are the results, generated by running $ julia benchmark/runbench.jl:

Dec 20 '18 18:12 NHDaly

Benchmark Report for FixedPointDecimals

Job Properties

Time of benchmark: 20 Dec 2018 - 13:15
Package commit: 0dbf53
Julia commit: d78923
Julia command flags: None
Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks. The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero.

ID	time	GC time	memory	allocations
`["*", " Int32"]`	0.086 ns (1%)
`["*", " Int64"]`	0.235 ns (1%)
`["*", " Int128"]`	0.285 ns (1%)
`["*", "BigFloat"]`	49.889 ns (1%)	2.421 ns	112 bytes (1%)	2
`["*", "BigInt"]`	261.839 ns (1%)	62.843 ns	48 bytes (1%)	3
`["*", "FD{ Int32,2}"]`	1.550 ns (1%)
`["*", "FD{ Int64,2}"]`	15.891 ns (1%)
`["*", "FD{Int128,2}"]`	2.070 μs (1%)	475.085 ns	456 bytes (1%)	24
`["*", "Float32"]`	0.339 ns (1%)
`["*", "Float64"]`	0.215 ns (1%)
`["+", " Int32"]`	0.076 ns (1%)
`["+", " Int64"]`	0.007 ns (1%)
`["+", " Int128"]`	0.180 ns (1%)
`["+", "BigFloat"]`	56.617 ns (1%)	4.689 ns	112 bytes (1%)	2
`["+", "BigInt"]`	258.110 ns (1%)	70.857 ns	48 bytes (1%)	3
`["+", "FD{ Int32,2}"]`	-0.027 ns (1%)
`["+", "FD{ Int64,2}"]`	-0.217 ns (1%)
`["+", "FD{Int128,2}"]`	-37.671 ns (1%)	-18.925 ns
`["+", "Float32"]`	0.238 ns (1%)
`["+", "Float64"]`	0.231 ns (1%)
`["/", " Int32"]`	3.837 ns (1%)
`["/", " Int64"]`	5.173 ns (1%)
`["/", " Int128"]`	13.514 ns (1%)
`["/", "BigFloat"]`	142.886 ns (1%)	2.393 ns	112 bytes (1%)	2
`["/", "BigInt"]`	421.201 ns (1%)	9.256 ns	464 bytes (1%)	10
`["/", "FD{ Int32,2}"]`	5.627 ns (1%)
`["/", "FD{ Int64,2}"]`	20.864 ns (1%)
`["/", "FD{Int128,2}"]`	2.093 μs (1%)	505.571 ns	456 bytes (1%)	24
`["/", "Float32"]`	0.413 ns (1%)
`["/", "Float64"]`	0.216 ns (1%)
`["div", " Int32"]`	0.000 ns (1%)
`["div", " Int64"]`	0.032 ns (1%)
`["div", " Int128"]`	0.100 ns (1%)
`["div", "BigFloat"]`	117.968 ns (1%)	2.174 ns	112 bytes (1%)	2
`["div", "BigInt"]`	263.183 ns (1%)	70.385 ns	40 bytes (1%)	2
`["div", "FD{ Int32,2}"]`	-0.125 ns (1%)
`["div", "FD{ Int64,2}"]`	2.543 ns (1%)
`["div", "FD{Int128,2}"]`	484.794 ns (1%)	102.801 ns	128 bytes (1%)	7
`["div", "Float32"]`	2.540 ns (1%)
`["div", "Float64"]`	2.390 ns (1%)
`["identity", " Int32"]`	0.265 ns (1%)
`["identity", " Int64"]`	0.323 ns (1%)
`["identity", " Int128"]`	0.525 ns (1%)
`["identity", "BigFloat"]`	151.618 ns (1%)	13.535 ns	336 bytes (1%)	6
`["identity", "BigInt"]`	695.563 ns (1%)	157.238 ns	136 bytes (1%)	8
`["identity", "FD{ Int32,2}"]`	1.293 ns (1%)
`["identity", "FD{ Int64,2}"]`	1.266 ns (1%)
`["identity", "FD{Int128,2}"]`	604.572 ns (1%)	137.165 ns	128 bytes (1%)	7
`["identity", "Float32"]`	0.978 ns (1%)
`["identity", "Float64"]`	1.043 ns (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["*"]
["+"]
["/"]
["div"]
["identity"]

Julia versioninfo

Julia Version 1.0.2
Commit d789231e99* (2018-11-08 20:11 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.2.0)
  uname: Darwin 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64 i386
  CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz: 
                 speed         user         nice          sys         idle          irq
       #1-12  2900 MHz     768878 s          0 s     389078 s    9840166 s          0 s
       
  Memory: 32.0 GB (3027.5859375 MB free)
  Uptime: 269534.0 sec
  Load Avg:  3.58056640625  3.8349609375  3.80224609375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Dec 20 '18 18:12 NHDaly

Codecov Report

Merging #42 into master will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #42   +/-   ##
=======================================
  Coverage   98.83%   98.83%           
=======================================
  Files           1        1           
  Lines         172      172           
=======================================
  Hits          170      170           
  Misses          2        2

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 483325a...ed2db17. Read the comment docs.

Dec 20 '18 18:12 codecov-io

So I just want to leave a status update here.

So I think this basically works. The benchmarks run (and after the merged-changes in https://github.com/JuliaCI/PkgBenchmark.jl/pull/75, they should correctly and precisely be measuring only the time for each operation (not copying the value, reading from an array, etc)).

The remaining blocker to merging is that it's extremely variable, so much so that I don't think it's useful. Even when running on a single computer, comparing the a single commit against itself, PkgBenchmark consistently reports statistically significant variance. If anyone has any advice about how to diagnose this, that would be much appreciated!

I've tried several things trying to pinpoint the source of the variance, but haven't had any luck:

I tried simplifying the benchmarks to just be @benchmarkable $op($x, $x), but saw the same variance there.
I tried statically compiling a sysimg containing FixedPointDecimals, and using that for running the benchmarks, which didn't help.
One of my coworkers tried disabling the GC (i'm not sure what steps they took), but said that didn't help either.

Does anyone have any other ideas? Without this, this seems not very useful. Sometimes the swings are as large as 100% or 200%, so I'm not sure we'd get meaningful feedback on PRs.

Feb 11 '19 20:02 NHDaly

Add FixedPointDecimal benchmark.

Benchmark Report for FixedPointDecimals

Job Properties

Results

Benchmark Group List

Julia versioninfo

Codecov Report