Why is this slow?
julia> n = normal(0,1)
julia> @time mean(n, 10000000)
7.962374 seconds (281.66 M allocations: 9.711 GiB, 9.67% gc time)
-0.00016734630189156838
julia> @time mean(randn(10000000))
0.091020 seconds (7 allocations: 76.294 MiB, 3.16% gc time)
0.00016417361940622018
@time mean([rand(Distributions.Normal(0,1)) for i = 1:10000000])
.204776 seconds (8.96 k allocations: 76.766 MiB, 32.13% gc time
julia> @benchmark randn()
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 4.542 ns (0.00% GC)
median time: 5.307 ns (0.00% GC)
mean time: 5.585 ns (0.00% GC)
maximum time: 29.316 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> using Mu
julia> @benchmark rand(Mu.normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 3.89 KiB
allocs estimate: 34
--------------
minimum time: 15.589 μs (0.00% GC)
median time: 16.151 μs (0.00% GC)
mean time: 17.636 μs (0.80% GC)
maximum time: 1.464 ms (96.68% GC)
--------------
samples: 10000
evals/sample: 1
julia> using Distributions
julia> @benchmark rand(Normal(0.1))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.919 ns (0.00% GC)
median time: 8.824 ns (0.00% GC)
mean time: 8.972 ns (0.00% GC)
maximum time: 44.803 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia>
5,000x overhead
this is weird. My times for Mu.normal are 1/10th of yours, but everything is 2x slower
julia> @benchmark rand(Mu.normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 3.88 KiB
allocs estimate: 33
--------------
minimum time: 1.242 μs (0.00% GC)
median time: 1.521 μs (0.00% GC)
mean time: 2.096 μs (21.77% GC)
maximum time: 355.423 μs (96.01% GC)
--------------
samples: 10000
evals/sample: 10
julia> @benchmark Mu.rand(normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 3.88 KiB
allocs estimate: 33
--------------
minimum time: 671.394 ns (0.00% GC)
median time: 735.423 ns (0.00% GC)
mean time: 988.936 ns (20.00% GC)
maximum time: 11.926 μs (89.64% GC)
--------------
samples: 10000
julia> @benchmark rand(x, Mu.DiffOmega) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 848 bytes
allocs estimate: 11
--------------
minimum time: 450.227 ns (0.00% GC)
median time: 459.798 ns (0.00% GC)
mean time: 535.215 ns (10.85% GC)
maximum time: 9.348 μs (93.61% GC)
--------------
samples: 10000
evals/sample: 198
julia> @benchmark rand(x) setup=(x=norm)
norm normal normalize normalize! normalize_string normpath
julia> @benchmark rand(x) setup=(x=Distributions.Normal(0, 1))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.880 ns (0.00% GC)
median time: 6.593 ns (0.00% GC)
mean time: 6.698 ns (0.00% GC)
maximum time: 27.988 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> quantilerand(x) = quantile(x, rand())
quantilerand (generic function with 1 method)
julia> @benchmark quantilerand(x) setup=(x=Distributions.Normal(0, 1))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 18.331 ns (0.00% GC)
median time: 19.775 ns (0.00% GC)
mean time: 20.052 ns (0.00% GC)
maximum time: 70.254 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
I think this is the best we'll get. Most of remaining overhead is dictionary creation
julia> @benchmark rand(x, Mu.DiffOmega) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 816 bytes
allocs estimate: 9
--------------
minimum time: 173.438 ns (0.00% GC)
median time: 182.110 ns (0.00% GC)
mean time: 271.396 ns (29.90% GC)
maximum time: 4.111 μs (92.86% GC)
--------------
samples: 10000
evals/sample: 737
julia> @benchmark rand(x) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 656 bytes
allocs estimate: 6
--------------
minimum time: 124.614 ns (0.00% GC)
median time: 131.486 ns (0.00% GC)
mean time: 200.081 ns (30.25% GC)
maximum time: 3.154 μs (94.68% GC)
--------------
samples: 10000
evals/sample: 898
0.7 is a little faster
With simple omega
julia> @benchmark rand(x) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 848 bytes
allocs estimate: 6
--------------
minimum time: 119.229 ns (0.00% GC)
median time: 130.683 ns (0.00% GC)
mean time: 197.373 ns (28.16% GC)
maximum time: 56.461 μs (99.62% GC)
--------------
samples: 10000
evals/sample: 927
Diff Omega
julia> @benchmark rand(x) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 704 bytes
allocs estimate: 7
--------------
minimum time: 93.745 ns (0.00% GC)
median time: 99.638 ns (0.00% GC)
mean time: 157.320 ns (34.25% GC)
maximum time: 52.377 μs (99.77% GC)
--------------
samples: 10000
evals/sample: 957
Big over increase below: Need to profile and fix.
julia> @benchmark rand(x) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 1.22 KiB
allocs estimate: 16
--------------
minimum time: 752.370 ns (0.00% GC)
median time: 899.571 ns (0.00% GC)
mean time: 1.156 μs (21.26% GC)
maximum time: 570.116 μs (99.78% GC)
--------------
samples: 10000
evals/sample: 127
With LinearΩ (big repgression)
julia> @benchmark rand(x) setup=(x=normal(0.0, 1.0))
BenchmarkTools.Trial:
memory estimate: 2.56 KiB
allocs estimate: 45
--------------
minimum time: 2.945 μs (0.00% GC)
median time: 3.085 μs (0.00% GC)
mean time: 3.500 μs (7.50% GC)
maximum time: 313.384 μs (98.39% GC)
--------------
samples: 10000
evals/sample: 8
Looking at the profile
- Suprising amount of time creating named tuples in trackerr (10%) and in callbacks (12%)
- tagging with soft err (10%), mostly due to expense of merge
- Hashing of Vector{Int} slow
- creating the dictionary is expensive
Sols
- Move to linkedlist
- dont create named tuple in applywoerr
- use wrapper instead of ref