VectorizedRNG.jl
VectorizedRNG.jl copied to clipboard
Bad NEON performance
julia> using VectorizedRNG, Random
julia> x = Vector{Float64}(undef, 1024);
julia> @benchmark randn!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.838 μs … 4.028 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.852 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.862 μs ± 72.118 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▄
▃███▅▄▂▂▂▂▁▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▂▁▂▁▁▂▁▂▂▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂ ▂
2.84 μs Histogram: frequency by time 3.16 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark randn!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.533 μs … 6.983 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.688 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.693 μs ± 77.624 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▇▂▂▂▇▅▂▂▁█▁▁ ▄
▂▁▁▁▂▂▂▂▂▂▂▂▃▃▃▃▆▄▅▅█▇████████████████▆▆▅▇▄▄▃▄▃▃▃▃▂▂▂▂▂▂▂▂ ▄
1.53 μs Histogram: frequency by time 1.83 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rand!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 146 evaluations.
Range (min … max): 698.918 ns … 8.839 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 700.630 ns ┊ GC (median): 0.00%
Time (mean ± σ): 707.420 ns ± 120.934 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄█▇▄ ▄▄▁ ▂
█████▇▆▆▆▄▅▄▆▅▆▆▆▆▅▅▅████▆▆▇█▅▆▆▄▆▆▆▆▇▆▅▄▁▁▄▅▅▄▅▄▄▅▆▅▅▄▃▆▅▅▄▅ █
699 ns Histogram: log(frequency) by time 755 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rand!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 152 evaluations.
Range (min … max): 682.566 ns … 949.836 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 683.664 ns ┊ GC (median): 0.00%
Time (mean ± σ): 687.544 ns ± 11.115 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
██▅▃▁ ▃▄ ▁▁ ▁
██████▇▇▇▇▇▆▅▆▆▅▆▄▅▄▄████▅▅▅███▇▆▅▆▆▆▆▆▅▅▃▄▄▄▄▄▅▂▄▅▃▅▂▄▅▄▄▄▅▅ █
683 ns Histogram: log(frequency) by time 735 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.9.0-DEV.1073
Commit 0b9eda116d* (2022-08-01 14:27 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.5.0)
CPU: 8 × Apple M1
For comparison, on Cascadelake:
julia> using VectorizedRNG, Random
julia> x = Vector{Float64}(undef, 1024);
julia> @benchmark randn!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.183 μs … 2.638 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.227 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.229 μs ± 27.961 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▅▆███▇▅▃▁
▂▁▂▂▂▃▃▄▅▇███████████▇▆▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂ ▃
1.18 μs Histogram: frequency by time 1.34 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark randn!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.594 μs … 4.573 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.742 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.744 μs ± 49.598 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▃▄▅▆████▆▆▇▆▅▂▂▁
▂▁▁▂▂▁▂▂▂▂▂▂▃▃▃▄▄▄▅▆▇████████████████████▇▆▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂ ▅
1.59 μs Histogram: frequency by time 1.87 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rand!(local_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 732 evaluations.
Range (min … max): 173.176 ns … 229.518 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 180.137 ns ┊ GC (median): 0.00%
Time (mean ± σ): 180.299 ns ± 1.007 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█
▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▁▂▁▂▁▁▁▂▁▁▁▂▂▁▂▂▂▂▁▂▁▁▂▂▂▅▆██▆▄▂▂▂▂▂▂▃▃▄▃▂ ▂
173 ns Histogram: frequency by time 182 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rand!(Random.default_rng(), $x)
BenchmarkTools.Trial: 10000 samples with 323 evaluations.
Range (min … max): 266.056 ns … 382.669 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 266.514 ns ┊ GC (median): 0.00%
Time (mean ± σ): 266.989 ns ± 1.768 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅▇███▆▅▄▃▁ ▁▂▂▂▂▂▁▂▂▁ ▁▁▂▃▃▂▂▁ ▂
███████████▅▅▃▅▆▅▇██████████▇▆▇▇▅▃▁▁▃▃▆▇███████████▇▇▇▆▆▆▅▅▅▆ █
266 ns Histogram: log(frequency) by time 272 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.9.0-DEV.1172
Commit 18fa3835a7* (2022-08-23 13:44 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz