Don't exploit sparsity in Transform3D, SpatialInertia?

Open tkoolen opened this issue 8 years ago • 1 comments

Especially on newer CPU architectures, it may be favorable not to exploit sparsity in e.g. multiplication of homogeneous transforms.

AVX2-capable machine:

Julia Version 0.6.0-pre.beta.295
Commit dc907c7 (2017-04-24 04:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

Older, non-AVX2-capable machine:

Julia Version 0.6.0-pre.beta.295
Commit dc907c760f (2017-04-24 04:37 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

In each case, I rebuilt the system image for the native architecture.

Exploiting sparsity

@benchmark (arot * brot, atrans + arot * btrans) setup = begin
    arot = rand(SMatrix{3, 3})
    brot = rand(SMatrix{3, 3})
    atrans = rand(SVector{3})
    btrans = rand(SVector{3})
end

AVX2:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.423 ns (0.00% GC)
  median time:      12.496 ns (0.00% GC)
  mean time:        12.906 ns (0.00% GC)
  maximum time:     32.182 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

Non-AVX2:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.598 ns (0.00% GC)
  median time:      11.208 ns (0.00% GC)
  mean time:        11.527 ns (0.00% GC)
  maximum time:     91.898 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999
  time tolerance:   5.00%
  memory tolerance: 1.00%

Not exploiting sparsity

@benchmark a * b setup = (a = rand(SMatrix{4, 4}); b = rand(SMatrix{4, 4}))

AVX2:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.331 ns (0.00% GC)
  median time:      5.344 ns (0.00% GC)
  mean time:        5.565 ns (0.00% GC)
  maximum time:     26.861 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Non-AVX2:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.679 ns (0.00% GC)
  median time:      8.138 ns (0.00% GC)
  mean time:        8.520 ns (0.00% GC)
  maximum time:     54.870 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999
  time tolerance:   5.00%
  memory tolerance: 1.00%

Apr 24 '17 16:04 tkoolen

Doing this before #207 would simplify #207.

Apr 24 '17 17:04 tkoolen