Setting MKL thread number
I'm not sure if I understood the docs correctly, but I assumed that BLAS.set_num_threads(n) == mkl_set_num_threads(n) and BLAS.get_num_threads() == mkl_get_max_threads() . It seems, however, that it might not be the case. At least, if I'm calling mkl_set_num_threads first, BLAS.get_num_threads shows the correct value, but calling BLAS.set_num_threads does not affect the result of mkl_get_max_threads. Moreover, if I'm calling BLAS.set_num_threads first, then subsequent calls to mkl_set_num_threads would not change the value returned by BLAS.get_num_threads... I also tried MKL_DYNAMIC=TRUE, but there was no difference. Can someone clarify what's going on?
$ MKL_DYNAMIC=FALSE julia --project
julia> using MKL, MKL.MKL_jll
julia> using LinearAlgebra
julia> get_max_threads() = ccall((:mkl_get_max_threads, libmkl_rt), Int32, ());
julia> set_max_threads(n) = ccall((:mkl_set_num_threads, libmkl_rt), Cvoid, (Ptr{Int32},), Ref(Int32(n)));
julia> mkl_get_dynamic() = ccall((:mkl_get_dynamic, libmkl_rt), Int32, ());
julia> mkl_get_dynamic()
0
julia> BLAS.get_num_threads()
96
julia> get_max_threads()
96
julia> set_max_threads(55)
julia> get_max_threads()
55
julia> BLAS.get_num_threads()
55
julia> BLAS.set_num_threads(60)
julia> BLAS.get_num_threads()
60
julia> get_max_threads()
55
$ MKL_DYNAMIC=FALSE julia --project
julia> using MKL, MKL.MKL_jll
julia> using LinearAlgebra
julia> get_max_threads() = ccall((:mkl_get_max_threads, libmkl_rt), Int32, ());
julia> set_max_threads(n) = ccall((:mkl_set_num_threads, libmkl_rt), Cvoid, (Ptr{Int32},), Ref(Int32(n)));
julia> mkl_get_dynamic() = ccall((:mkl_get_dynamic, libmkl_rt), Int32, ());
julia> mkl_get_dynamic()
0
julia> BLAS.get_num_threads()
96
julia> BLAS.set_num_threads(55)
julia> BLAS.get_num_threads()
55
julia> get_max_threads()
96
julia> set_max_threads(60)
julia> get_max_threads()
60
julia> BLAS.get_num_threads()
55
Hi @hakkelt, in terms of oneMKL logic BLAS.set_num_threads(n) == mkl_set_num_threads(n) and BLAS.get_num_threads() == mkl_get_max_threads() are not always correct.
As far as I know BLAS.set_num_threads uses mkl_domain_set_num_threads and BLAS.get_num_threads() uses mkl_domain_get_max_threads. In case of domain specific function mkl_domain_set_num_threads was not used your assumption is correct: BLAS should use number of thread defined by mkl_set_num_thread, but if mkl_domain_set_num_threads was defined BLAS should use domain specific number of threads instead of the number specified by more general mkl_set_num_threads function. Hope it clarifies the oneMKL behavior.
Hi @mkrainiuk, thanks for the clarification; it helped a lot!
However, the problem I faced in my project remains: BLAS.set_num_threads sets the BLAS and LAPACK thread count when using OpenBLAS, but only the BLAS thread count is when using MKL. Moreover, it is currently not possible to set only LAPACK thread count with MKL, but either only for BLAS or all for all MKL domains together. In the Intel forum, they said they will consider adding that domain for a future release... :/
In my current project, I managed to solve the problem by calling C-API directly and setting the thread count for all domains:
mkl_get_num_threads() = ccall((:mkl_get_max_threads, libmkl_rt), Int32, ())
mkl_set_num_threads(n) = ccall((:mkl_set_num_threads, libmkl_rt), Cvoid, (Ptr{Int32},), Ref(Int32(n)))
Wouldn't it make sense to expose these functions in MKL.jl and add some explanation to readme? Or is there a more straightforward solution?
I'm not sure if a new Issue should be created, but I also wanted to revive this one for a way to change threads at runtime. Since BLAS.set_num_threads doesn't actually make MKL single threaded, this is incorrect
For reference, I'll leave here the Issue started by @hakkelt himself on the MKL page
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-set-number-of-LAPACK-threads-during-runtime/m-p/1638274#M36554
Fixed in #180