oneMKL
oneMKL copied to clipboard
BLAS: tests may not be executing the right backend
Summary
When trying a new library we encountered an unexpected behaviour, as it seems that for BLAS (at least), most tests are actually comparing the reference libblas to itself (but wrapped in sycl), not to the actual intended backend. This can be reproduced with MKL.
As I understand, unit_tests such as axpy_usm.cpp are compiled and linked with reference libblas, hence cblas_xxx symbols are already resolved at the start of execution.
When the test ( for example RT test AxpyUsmTestSuite/AxpyUsmTests.RealSinglePrecision/Column_Major_SYCL_host_device ) is executed, it runs the reference call, and then tries the sycl backend tested (mklcpu for example), by dlopening the corresponding libonemkl_blas_mklcpu.so lib. But as this dlopen is made lazily and not deeply, the cblas_saxpy symbol still remains the one from libcblas. This is for the RT variant of the test, as the CT variant does not dynamically load anything, so the problem is worse for these.
gdb trace, with breaks on cblas_saxpy and dlopen
gdb --args ./bin/test_main_blas_rt "--gtest_filter=AxpyUsmTestSuite/AxpyUsmTests.RealSinglePrecision/Column_Major_SYCL_host_device" "--gtest_also_run_disabled_tests"
first breakpoint is the reference call, 2 is the dlopen and 3 is the call which should be from the mkl :
[ RUN ] AxpyUsmTestSuite/AxpyUsmTests.RealSinglePrecision/Column_Major_SYCL_host_device
Breakpoint 1, 0x00007ffff7f9cdd0 in cblas_saxpy () from /foo/lapack/build/lib/libcblas.so.3
(gdb)
Continuing.
Breakpoint 2, ___dlopen (file=0x7ffff7f0c020 "libonemkl_blas_mklcpu.so", mode=257) at ./dlfcn/dlopen.c:77
77 in ./dlfcn/dlopen.c
(gdb) c
Continuing.
[New Thread 0x7fffe79ff640 (LWP 19332)]
[Switching to Thread 0x7fffe79ff640 (LWP 19332)]
Thread 2 "test_main_blas_" hit Breakpoint 1, 0x00007ffff7f9cdd0 in cblas_saxpy () from /foo/lapack/build/lib/libcblas.so.3
(gdb) bt
#0 0x00007ffff7f9cdd0 in cblas_saxpy () from /foo/lapack/build/lib/libcblas.so.3
#1 0x00007ffff7d0f741 in cl::sycl::detail::DispatchHostTask::operator()() const () from /foo/llvm/build/lib/libsycl.so.5
#2 0x00007ffff7c79ae3 in cl::sycl::detail::ThreadPool::worker() () from /foo/llvm/build/lib/libsycl.so.5
We can see that the third breakpoint, with the call from within the sycl host task seems to be inside libcblas and not the mkl one.
The behavior of the dlopen can be altered by using RTLD_DEEPBIND instead, to force overriding previously loaded symbols with the new ones, and this seems to make the test work as expected
so adding | RTLD_DEEPBIND at https://github.com/oneapi-src/oneMKL/blob/develop/src/include/function_table_initializer.hpp#L34 yields a different and more expected results (even if using it may change too much the behaviour for actual codes)
[ RUN ] AxpyUsmTestSuite/AxpyUsmTests.RealSinglePrecision/Column_Major_SYCL_host_device
Breakpoint 2, 0x00007ffff7f9cdd0 in cblas_saxpy () from /foo/lapack/build/lib/libcblas.so.3
(gdb)
Continuing.
Breakpoint 1, ___dlopen (file=0x7ffff7f0c020 "libonemkl_blas_mklcpu.so", mode=265) at ./dlfcn/dlopen.c:77
77 in ./dlfcn/dlopen.c
(gdb)
Continuing.
[New Thread 0x7fffe79ff640 (LWP 20195)]
[Switching to Thread 0x7fffe79ff640 (LWP 20195)]
Thread 2 "test_main_blas_" hit Breakpoint 2, 0x00007fffee720f00 in cblas_saxpy () from /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_ilp64.so.2
(gdb) bt
#0 0x00007fffee720f00 in cblas_saxpy () from /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_ilp64.so.2
#1 0x00007ffff7d0f741 in cl::sycl::detail::DispatchHostTask::operator()() const () from /foo/llvm/build/lib/libsycl.so.5
but that only works for the _rt variant, not the _ct ones, as everything is resolved at compile-time (so everything comes from libcblas in the ct case even with the workaround). Am I missing something here ?
Version
git latest, commit 28c6deecfe
Environment
oneMKL works with multiple HW and backend libraries and also depends on the compiler and build environment. Include the following information to help reproduce the issue:
- Backend library version: mkl latest
- OS name and version: ubuntu 22.04 (wsl2, but we saw the behaviour on other platforms)
- Compiler version (intel/llvm open source, commit 4658b61ab0a )
- CMake output log log_cmake.txt
Steps to reproduce
build with reference lapack as ref, clang from llcm/intel in the path, and MKLROOT set.
example for debug purposes
cmake -DCMAKE_CXX_COMPILER=clang++
-DCMAKE_C_COMPILER=clang
-DENABLE_CUBLAS_BACKEND=False
-DENABLE_MKLCPU_BACKEND=True
-DENABLE_MKLGPU_BACKEND=False
-DCMAKE_CXX_FLAGS="-O0 -g3 -DCBLAS_ORDER=CBLAS_LAYOUT"
-DBUILD_FUNCTIONAL_TESTS=ON
-DREF_BLAS_ROOT=/path/to/lapack/build/
-DREF_LAPACK_ROOT=/path/to/lapack/build/ ..
make check behaviour with saxpy usm (on RT test):
gdb --args ./bin/test_main_blas_rt "--gtest_filter=AxpyUsmTestSuite/AxpyUsmTests.RealSinglePrecision/Column_Major_SYCL_host_device" "--gtest_also_run_disabled_tests"
break on cblas_saxpy to see backtrace of various calls.
Observed behavior
All cblas calls in the test come from libcblas
#0 0x00007ffff7f9cdd0 in cblas_saxpy () from /foo/lapack/build/lib/libcblas.so.3 #1 0x00007ffff7d0f741 in cl::sycl::detail::DispatchHostTask::operator()() const () from /foo/llvm/build/lib/libsycl.so.5
Expected behavior
Half (even) of the calls to cblas saxpy should come from MKL, the other half (odd) would still be from the reference libcblas library
#0 0x00007fffee720f00 in cblas_saxpy () from /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_ilp64.so.2 #1 0x00007ffff7d0f741 in cl::sycl::detail::DispatchHostTask::operator()() const () from /foo/llvm/build/lib/libsycl.so.5
@adegomme This is a very interesting find. Thank you for reporting it. We will take a closer look at it, but my initial guess is, this could only happen with BLAS domain and with mklcpu backend - where the cblas_*() symbols used in mklcpu backend is identical to the ones in libcblas library. We don't have this condition for other domains/backends AFAIK. I am actually working on another issue (#114) where we replace all the internal symbols used in mklcpu/mklgpu backends with external oneMKL symbols. I believe fixing #114 will also fix this one. Do you agree?
I agree, as this is already pretty much what is done for lapack on mkl if I'm not mistaken (not using lapacke symbols but c++ interface), and the lapack tests I checked did not show the bug (and some where the symbol might differ as less standardised, such as crot/csrot seem unaffected as well).
But other backends are also affected if they respect the cblas/lapacke interfaces and symbols, so I wouldn't see that as a fix, as it would only work for the mkl backends.
Agreed on your comment for the other backends. @mkrainiuk Could you please help with a global solution for this problem?
Sorry for the super long delay as I worked on another project for last two months. I think as more global solution would be using runtime loading mechanism for reference functions and explicitly link only oneMKL libraries. We already used this approach for several Fortran functions here: https://github.com/oneapi-src/oneMKL/blob/f43e4e5943d0ef3a85ff3331d567eadb09c8fe71/tests/unit_tests/blas/include/reference_blas_templates.hpp#L62
We could extend it to use for all reference functions so that not only Intel oneMKL but any other third-party libraries that use standard CBLAS/BLAS API won't have these problems in the testing. What do you think?
@mkrainiuk This sounds like the right solution to me. Let me know how I can help.
After some more investigation of this connected to #210, @mmeterel and I notice this affects tests that call cblas interface to oneMKL but not Fortran interface. Unfortunately in #210 we are switching to external interfaces for everything (because that's the right long-term solution) but this now means that the present issue with wrong backend will affect every test in the BLAS domain.