OpenBLAS
OpenBLAS copied to clipboard
cblas_sgemm return all zero
Hi Xianyi,
We tried to run a matrix multiplication with cblas_sgemm or cblas_dgemm on android. We tried with A = [1 3 4 6], B = [3 5 9 1], and C = A * B. We initialized C with all zero. The result of C did not end up with A * B, but remains all zero. Do you have any hints? Thanks!
Could you please name the version of OpenBLAS you used, what kind of android hardware (cpu) and perhaps post the code snippet with your cblas_sgemm call just in case you got a calling parameter wrong ? What you could do (if you have not already done so) is confirm that your code works with some other implementation of (c)blas, e.g. the netlib reference implementation.
@martin-frbg
-
We used the latest OpenBlas code
-
We tried with two devices and end up with the same issue: (1) Google Pixel Qualcomm Snapdragon 821 (2) Samsung G920
-
Test Code: float A[4] = {1, 2, 3, 4}; float B[4] = {1, 1, 1, 1}; float C[4] = {0, 0, 0, 0}; int M = 2; int N = 2; int K = 2; float alpha = 1.0f; float beta = 0.0f; int lda = 2; int ldb = 2; cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, lda, B, ldb, beta, C, N);
-
How we compiled OpenBlas on Android (We have compiled on both Ubuntu and Mac OS, and got the same results):
set( CMAKE_SHARED_LINKER_FLAGS "-Wl,--no-warn-mismatch -lm_hard ${CMAKE_SHARED_LINKER_FLAGS}" ) add_library(lib_openblas STATIC IMPORTED) set_target_properties(lib_openblas PROPERTIES IMPORTED_LOCATION ../../../../../thirdparty/OpenBLAS/lib/armeabi/libopenblas.a)
target_link_libraries( # Specifies the target library. native-lib ${log-lib} lib_openblas)
-
We have tried that several functions actually work: (1) cblas_copy (2) cblas_swap
-
Two major functions that don't work: (1) cblas_sgemm (2) cblas_dgemm
Error: Initialize C with zero and always return zero
I am not really familiar with android compilation. So will ask further on the below statement as I can help more on Ubuntu.
"We have compiled on both Ubuntu and Mac OS , and got the same results" Can you please list out the steps you used to compile ?
@ashwinyes
Steps we used for compile:
-
Install Android Studio and download ndk.
-
Download latest OpenBLSA code with git clone https://github.com/xianyi/OpenBLAS.git
-
Compile OpenBLAS with following command: export NDK_ROOT=path_to_ndk_bundle export APP_ABI=android-21 export CFLAGS="--sysroot=${NDK_ROOT}/platforms/${APP_ABI}/arch-arm" export PATH=$NDK_ROOT/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:$PATH make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1 make PREFIX=dist/armeabi/ install
-
Create a android project in Android Studio with c++ support.
-
Copy OpenBLAS headers and libs to Android project.
-
Add test code in android cpp file. float A[4] = {1, 2, 3, 4}; float B[4] = {1, 1, 1, 1}; float C[4] = {0, 0, 0, 0}; int M = 2; int N = 2; int K = 2; float alpha = 1.0f; float beta = 0.0f; int lda = 2; int ldb = 2;
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, lda, B, ldb, beta, C, N);
-
Compile android project with following config in CMakeLists.txt :
link open blas
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mhard-float -D_NDK_MATH_NO_SOFTFP=1") set( CMAKE_SHARED_LINKER_FLAGS "-Wl,--no-warn-mismatch -lm_hard ${CMAKE_SHARED_LINKER_FLAGS}" ) add_library(lib_openblas STATIC IMPORTED) set_target_properties(lib_openblas PROPERTIES IMPORTED_LOCATION ../../../../../thirdparty/OpenBLAS/lib/armeabi/libopenblas.a)
target_link_libraries( # Specifies the target library. native-lib
# Links the target library to the log library
# included in the NDK.
${log-lib}
lib_openblas)
- Debug android project with android devices and found that the values in C are never changed.
Thanks for the detailed steps. I dont have a arm32 machine to test on.
I tested on aarch64 machine for TARGET=ARMV8 and the result is correct.
Since the devices you are testing are also aarch64 processors, could you please try compiling for TARGET=ARMV8 using android aarch64 compilers and let us know the results.
Thanks
There was a change in the register zeroing code of the s/dgemm_kernel_4x4_vfpv3.S functions for ARMv7 a year ago when it was found that the previous code could not clear a spurious NaN (see #740), perhaps it would make sense to go back to (git checkout) 5f2fa15 for a quick test if you need to get this working on ARMv7 ?
I note strangeness in cblas_sgemm was also observed in #1014 although the circumstances there were less clear.