Octavian.jl
Octavian.jl copied to clipboard
Dual-socket support
In my recent dgemm comparison benchmarks (on an Zen3 AMD Milan system) I find that Octavian is essentially not scaling at all from single-socket to dual-socket. Below 64 cores corresponds to a full single socket and 128 cores to the full dual-socket system.
| BLAS | # cores | size | GFLOPS |
|---|---|---|---|
| Intel MKL v2022.0.0 | 128 cores | 10240 | 3279 |
| Intel MKL v2022.0.0 | 64 cores | 10240 | 1684 |
| BLIS 0.9.0 | 128 cores | 10240 | 3893 |
| BLIS 0.9.0 | 64 cores | 10240 | 2014 |
| Octavian 0.3.15 | 128 cores | 10240 | 1843 |
| Octavian 0.3.15 | 64 cores | 10240 | 1802 |
Would be great to see Octavian perform better here :)
With: https://github.com/JuliaSIMD/CPUSummary.jl/commit/d93cf1c1765c37c9fbe809b68a3e5f10fb6bb458 It should support dual sockets.
However, Octavian also does not support more than 64 threads:
https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/ccd903373524827e92ddb4f68967529e4761626b/src/matmul.jl#L378
This will have to be replaced with the more usual PolyesterWeave.request_threads, which returns a tuple.
Then it'll have to iterate over these when launching threads.
Apparently I never got around to doing that, and just accessed some internals to only get the first element of the tuple, rather than the entire tuple.
Each element of the tuple corresponds to sets of 64 threads.
Then, of course, #152 is a third issue.