BenchmarksGame.jl WIP: Hacky(!) but faster nbody implementations

WIP: Hacky(!) but faster nbody implementations

Open smallnamespace opened this issue 5 years ago • 1 comments

Here are a couple of messy implementations that, at least on my laptop with --target-cpu=core2 (the architecture of the actual BenchmarksGame test machine), beat the current ...simd.jl by about 60% and 40%:

impl	1st run	2nd run	speedup vs. simd
simd	5.95s	5.75s	-
unsafe_simd	7.6s	4.15s	40%
unsafe_simd_unroll	7.3s	3.6s	60%
Rust#7	-	3.1s	85%

Would like some feedback before cleaning this up further (and getting too deep in this rabbit hole 🙂), in particular whether this is helpful for showing off the language, since the code is getting far from idiomatic Julia.

A few caveats:

Not idiomatic Julia because we're porting gcc #4 and rust #7, which liberally use SIMD intrinsics, lay memory out by hand, etc.
Compilation time is much longer (probably due to using StaticArrays), so this awaits Julia AOT (#35) to show real gains; or I can switch using NTuples with unsafe stores.

Btw, the ...unroll.jl file has a hacky macro that fully unrolls some of the inner loops. This mimics how Rust #7 achieves its speedup: rustc is smart enough to automatically unroll the (outer) for loops inside advance, e.g. rsqrt is seen 5 times in decompiled asm.

I didn't go all the way to unrolling the stride-2 loop, but could be persuaded to hack something up just to see how much improvement can be found.

@KristofferC Thanks again for your help getting intrinsics working.

Sep 06 '19 20:09 smallnamespace

Very cool. I think we can probably polish it to be more idiomatic Julia over time. I've been trying to get this one faster using simd intrinsics on my machine for a while now and mostly failing.

My preference would be to just use NTuples with unsafe_store! for now. I think getting AOT compilation working consistently on the benchmarks-game machine might be a ways off. And it might not be accepted at all depending on whether the maintainer wants to deal with the headache.

Sep 07 '19 00:09 non-Jedi

BenchmarksGame.jl BenchmarksGame.jl copied to clipboard

WIP: Hacky(!) but faster nbody implementations

BenchmarksGame.jl
BenchmarksGame.jl copied to clipboard