sycl-blas
sycl-blas copied to clipboard
`blas1_rotmg_test` and `blas1_rotmg_test` fail
Hello everyone, I was trying your library but the tests in the title fail.
Steps to reproduce:
git clone --recursive https://github.com/codeplaysoftware/portBLAS.git
cd portBLAS
export CC=icx
export CXX=icpx
cmake -S . -B build -DSYCL_COMPILER=dpcpp
cd build
make all
make test
blas1_rotg_test
output:
Device vendor: Intel(R) Corporation
Device name: Intel(R) UHD Graphics 620
Device type: gpu
...
[ RUN ] Rotg/RotgFloat.test/alloc_usm__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
Actual: false
Expected: true
[ FAILED ] Rotg/RotgFloat.test/alloc_usm__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("usm", 4-byte object <00-00 00-00>, 3.40282e+38, -3.40282e+38) (0 ms)
...
[ RUN ] Rotg/RotgFloat.test/alloc_usm__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
Actual: false
Expected: true
[ FAILED ] Rotg/RotgFloat.test/alloc_usm__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("usm", 4-byte object <01-00 00-00>, 3.40282e+38, -3.40282e+38) (0 ms)
...
[ RUN ] Rotg/RotgFloat.test/alloc_buf__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
Actual: false
Expected: true
[ FAILED ] Rotg/RotgFloat.test/alloc_buf__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("buf", 4-byte object <00-00 00-00>, 3.40282e+38, -3.40282e+38) (1 ms)
...
[ RUN ] Rotg/RotgFloat.test/alloc_buf__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
Actual: false
Expected: true
[ FAILED ] Rotg/RotgFloat.test/alloc_buf__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("buf", 4-byte object <01-00 00-00>, 3.40282e+38, -3.40282e+38) (1 ms)
blas1_rotmg_test
output:
Device vendor: Intel(R) Corporation
Device name: Intel(R) UHD Graphics 620
Device type: gpu
...
[ RUN ] Rotmg_Usm/Rotmg_UsmFloat.test/alloc_usm__d1_2p9__d2_27431224__x1_1p50__y1_0p0__will_overflow_0
/tmp/portBLAS/test/unittest/blas1/blas1_rotmg_test.cpp:134: Failure
Value of: isAlmostEqual
Actual: false
Expected: true
[ FAILED ] Rotmg_Usm/Rotmg_UsmFloat.test/alloc_usm__d1_2p9__d2_27431224__x1_1p50__y1_0p0__will_overflow_0, where GetParam() = ("usm", 2.1, 2.74312e+07, 1.5, 5.72622e-08, false) (0 ms)
...
[ RUN ] Rotmg_Buffer/Rotmg_BufferFloat.test/alloc_buf__d1_2p9__d2_27095732__x1_1p50__y1_0p0__will_overflow_0
/tmp/portBLAS/test/unittest/blas1/blas1_rotmg_test.cpp:134: Failure
Value of: isAlmostEqual
Actual: false
Expected: true
[ FAILED ] Rotmg_Buffer/Rotmg_BufferFloat.test/alloc_buf__d1_2p9__d2_27095732__x1_1p50__y1_0p0__will_overflow_0, where GetParam() = ("buf", 2.1, 2.70957e+07, 1.5, 5.46859e-08, false) (1 ms)
Ubuntu version: 22.04.1
Cmake version: 3.27.4
icpx
version: Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
CPU: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Hello @fbarbari
Thank you for your interest in portBLAS. Could you please share the name of the reference BLAS implementation you are using for these tests?
There is a known issue with rotg
and rotmg
where some of the tests fail because of the usage of fast-math
compiler directive documented here: https://github.com/codeplaysoftware/portBLAS/blob/fec888faae176bb2f86f3eaa9a4cd7739606052a/test/unittest/CMakeLists.txt#L106
and we had to explicitly specify not to use fast-math
for these tests to make sure that these tests would pass. We will try to investigate these further and get back to you soon. Thanks.
Could you please share the name of the reference BLAS implementation you are using for these tests?
Cmake tells me this:
...
-- Found SystemBLAS: BLAS_LIBRARIES
...
Which I don't think is that much informative. I have installed OpenBLAS 0.3.24 on this system.
Hello @fbarbari,
Looking at this issue I remember we had issues with these rotg
and rotmg
as we have found that they have many edge cases that are not well defined and libraries can give different output. In particular we have discussed this issue before with rotmg
and OpenBLAS in https://github.com/codeplaysoftware/portBLAS/pull/376. In the end we decided to make our implementations match with netlib blas and cuBLAS. I have added a bit of documentation for this issue in https://github.com/codeplaysoftware/portBLAS/pull/506
I suggest using netlib blas if you want all tests to be green.
I'm sorry I was mistaken somewhere. Locally OpenBLAS is giving correct results for these tests although I am using OpenBLAS 0.3.20 and my integrated GPU is UHD Graphics 770. I see there has been some changes regarding rotg
in the OpenBLAS release notes. We will revisit how we want to approach this issue.
Hi @fbarbari,
We looked into this issue and on our side everything works fine with openBLAS 0.3.26.
We were able to reproduce test failure when the configuration uses a different BLAS library provided by oneAPI toolkit. To fix it and give more clarity there is PR #509 open.
If you don't want to wait for it to be merged I suggest you to reconfigure and compile portBLAS adding two flags to specify openBLAS path:
-DOPENBLAS_LIBRARIES=/path/to/openblas/lib
and -DOPENBLAS_INCLUDE_DIRS=/path/to/openBLAS/include
.
Hello @fbarbari, I think this issue is solved so I am going to close it. If you feel your problem is not resolved, please reopen it or open another issue for us. Thank you!