YaoBlocks.jl
YaoBlocks.jl copied to clipboard
Weird Performance of SMatrix and Matrix on instruct!
MWE:
function instruct2!(state, U, loc)
a, c, b, d = U
step = 1 << (loc - 1)
step_2 = 1 << loc
for j in 0:step_2:size(state, 1)-step
@inbounds for i in j+1:j+step
YaoArrayRegister.u1rows!(state, i, i+step, a, b, c, d)
end
end
return state
end
The performance of instruct is quite different on my machine with Julia 1.1 with SMatrix and Matrix, unexpectedly, SMatrix is even slower. This is causing current QCBM circuit slower than before.
It will be more obvious when you pack a few instruct together, e.g using chained put blocks.
using StaticArrays, BenchmarkTools
U = @SMatrix rand(ComplexF64, 2, 2)
st = rand(ComplexF64, 1<<20)
julia> @benchmark foreach(k->instruct2!($st, $U, 2), 1:100)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 180.581 ms (0.00% GC)
median time: 186.595 ms (0.00% GC)
mean time: 186.496 ms (0.00% GC)
maximum time: 193.396 ms (0.00% GC)
--------------
samples: 27
evals/sample: 1
julia> @benchmark foreach(k->instruct2!($st, $(Matrix(U)), 2), 1:100)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 169.435 ms (0.00% GC)
median time: 187.297 ms (0.00% GC)
mean time: 188.134 ms (0.00% GC)
maximum time: 208.341 ms (0.00% GC)
--------------
samples: 27
evals/sample: 1
But SMatrix should be faster
julia> @benchmark foreach(k->instruct3!($st, $U, 2), 1:100)
BenchmarkTools.Trial:
memory estimate: 80 bytes
allocs estimate: 1
--------------
minimum time: 10.900 ns (0.00% GC)
median time: 13.812 ns (0.00% GC)
mean time: 24.196 ns (38.99% GC)
maximum time: 50.778 μs (99.92% GC)
--------------
samples: 10000
evals/sample: 998
julia> @benchmark foreach(k->instruct3!($st, $(Matrix(U)), 2), 1:100)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 63.906 ns (0.00% GC)
median time: 67.717 ns (0.00% GC)
mean time: 71.838 ns (0.00% GC)
maximum time: 212.529 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 976