Octavian.jl icon indicating copy to clipboard operation
Octavian.jl copied to clipboard

Dual-socket support

Open carstenbauer opened this issue 3 years ago • 1 comments

In my recent dgemm comparison benchmarks (on an Zen3 AMD Milan system) I find that Octavian is essentially not scaling at all from single-socket to dual-socket. Below 64 cores corresponds to a full single socket and 128 cores to the full dual-socket system.

BLAS # cores size GFLOPS
Intel MKL v2022.0.0 128 cores 10240 3279
Intel MKL v2022.0.0 64 cores 10240 1684
BLIS 0.9.0 128 cores 10240 3893
BLIS 0.9.0 64 cores 10240 2014
Octavian 0.3.15 128 cores 10240 1843
Octavian 0.3.15 64 cores 10240 1802

Would be great to see Octavian perform better here :)

carstenbauer avatar Jul 13 '22 13:07 carstenbauer

With: https://github.com/JuliaSIMD/CPUSummary.jl/commit/d93cf1c1765c37c9fbe809b68a3e5f10fb6bb458 It should support dual sockets.

However, Octavian also does not support more than 64 threads: https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/ccd903373524827e92ddb4f68967529e4761626b/src/matmul.jl#L378 This will have to be replaced with the more usual PolyesterWeave.request_threads, which returns a tuple. Then it'll have to iterate over these when launching threads.

Apparently I never got around to doing that, and just accessed some internals to only get the first element of the tuple, rather than the entire tuple.

Each element of the tuple corresponds to sets of 64 threads.

Then, of course, #152 is a third issue.

chriselrod avatar Jul 22 '22 03:07 chriselrod