RigidBodyDynamics.jl
RigidBodyDynamics.jl copied to clipboard
Don't exploit sparsity in Transform3D, SpatialInertia?
Especially on newer CPU architectures, it may be favorable not to exploit sparsity in e.g. multiplication of homogeneous transforms.
AVX2-capable machine:
Julia Version 0.6.0-pre.beta.295
Commit dc907c7 (2017-04-24 04:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Older, non-AVX2-capable machine:
Julia Version 0.6.0-pre.beta.295
Commit dc907c760f (2017-04-24 04:37 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
In each case, I rebuilt the system image for the native architecture.
Exploiting sparsity
@benchmark (arot * brot, atrans + arot * btrans) setup = begin
arot = rand(SMatrix{3, 3})
brot = rand(SMatrix{3, 3})
atrans = rand(SVector{3})
btrans = rand(SVector{3})
end
AVX2:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 12.423 ns (0.00% GC)
median time: 12.496 ns (0.00% GC)
mean time: 12.906 ns (0.00% GC)
maximum time: 32.182 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
Non-AVX2:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 10.598 ns (0.00% GC)
median time: 11.208 ns (0.00% GC)
mean time: 11.527 ns (0.00% GC)
maximum time: 91.898 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
time tolerance: 5.00%
memory tolerance: 1.00%
Not exploiting sparsity
@benchmark a * b setup = (a = rand(SMatrix{4, 4}); b = rand(SMatrix{4, 4}))
AVX2:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.331 ns (0.00% GC)
median time: 5.344 ns (0.00% GC)
mean time: 5.565 ns (0.00% GC)
maximum time: 26.861 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
Non-AVX2:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.679 ns (0.00% GC)
median time: 8.138 ns (0.00% GC)
mean time: 8.520 ns (0.00% GC)
maximum time: 54.870 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
time tolerance: 5.00%
memory tolerance: 1.00%
Doing this before #207 would simplify #207.