Octavian.jl Dual-socket support

Dual-socket support

Open carstenbauer opened this issue 3 years ago • 1 comments

In my recent dgemm comparison benchmarks (on an Zen3 AMD Milan system) I find that Octavian is essentially not scaling at all from single-socket to dual-socket. Below 64 cores corresponds to a full single socket and 128 cores to the full dual-socket system.

BLAS	# cores	size	GFLOPS
Intel MKL v2022.0.0	128 cores	10240	3279
Intel MKL v2022.0.0	64 cores	10240	1684
BLIS 0.9.0	128 cores	10240	3893
BLIS 0.9.0	64 cores	10240	2014
Octavian 0.3.15	128 cores	10240	1843
Octavian 0.3.15	64 cores	10240	1802

Would be great to see Octavian perform better here :)

Jul 13 '22 13:07 carstenbauer

With: https://github.com/JuliaSIMD/CPUSummary.jl/commit/d93cf1c1765c37c9fbe809b68a3e5f10fb6bb458 It should support dual sockets.

However, Octavian also does not support more than 64 threads: https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/ccd903373524827e92ddb4f68967529e4761626b/src/matmul.jl#L378 This will have to be replaced with the more usual PolyesterWeave.request_threads, which returns a tuple. Then it'll have to iterate over these when launching threads.

Apparently I never got around to doing that, and just accessed some internals to only get the first element of the tuple, rather than the entire tuple.

Each element of the tuple corresponds to sets of 64 threads.

Then, of course, #152 is a third issue.

Jul 22 '22 03:07 chriselrod

Octavian.jl Octavian.jl copied to clipboard

Dual-socket support

Octavian.jl
Octavian.jl copied to clipboard