OpenBLAS ARMV7 (with hard float flag) did not run with correct result

hello,

Resently, we are using openblas to setup caffe env in our ARMv7 platform, but we got a problem when run openblas with hard float flag.
we compiled the openblas used following command:
       make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=ARMV7 libs

then we had a simple test used following code:
int main()

{

const enum CBLAS_ORDER Order=CblasRowMajor;
const enum CBLAS_TRANSPOSE TransA=CblasNoTrans;
const enum CBLAS_TRANSPOSE TransB=CblasNoTrans;
const int M=4;
const int N=2;
const int K=3;
const float alpha=1;
const float beta=0;
const int lda=K;
const int ldb=N;
const int ldc=N;
const float A[M*K]={1.123434543534,2.33234241365,3.4534545454,4.45435435345,5.454554545,6.45452345345,7.454545465,8.454545245,9.2345245625,8.45234545,7.423564545,6.425452454};
const float B[K*N]={5.4523452345,4.34526547,3.462354544,2.52436254,1.262565262,0.265364564565};
float C[M*N];

cblas_sgemm(Order, TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc);
for (int i = 0; i < M; i++)
{
    for (int j = 0; j < N; j++)
    {
        cout << C[i*M + j] << " ";
    }
    cout << endl;
}
}

return EXIT_SUCCESS; }

testing code compling command: arm-linux-gnueabihf-g++ -mfpu=vfpv3 -mfloat-abi=hard -o test testblas.cpp /usr/local/arm/openblas/lib/libopenblas_armv7p-r0.2.20.dev.a -I/usr/local/arm/boost/include/ -lpthread

but, when we run the test code in our ARMv7 platform, we got a strange result, as below: 1.4013e-45 1.4013e-45 1.4013e-45 1.4013e-45 1.4013e-45 1.4013e-45 1.4013e-45 1.4013e-45

it is not the correct result...

when we used the openblas lib in our caffe code, it caused coredump when called openblas APIs.

can you help for this? thank you very much.

Apr 07 '17 07:04 gangm

Would your hardware also allow building with 64bit ARMv8 target for comparison ? There was a similar report in #1088 where I suggested reverting a small change from a year ago, unfortunately it seems nobody tried.

Apr 07 '17 09:04 martin-frbg

Tried your code on ARM32 QEMU (since I don't have a ARMv7 machine) with the latest OpenBLAS develop branch. The following is the result.

18.561 11.6857 
81.5766 56.1848
1.12343 2.33234
5.45455 6.45452

On ARMv8 and Intel also, I am getting the same result.

So the issue I believe, is related to your ARMv7 setup, and not OpenBLAS.

Apr 07 '17 10:04 ashwinyes

hello,

1.our hardware doesn't support 64bit ARMv8 for comparison...

2."So the issue I believe, is related to your ARMv7 setup, and not OpenBLAS." what did "related to your ARMv7 setup" mean? you means our hardware setup or arm cross compile envionment?

3.I have another question: which version(branch) should i use? Now i am tring "arm_soft_fp_abi" branch, and use compile command:"make CC=arm-none-linux-gnueabi-gcc TARGET=ARMV7 NOFORTRAN=1 HOSTCC=gcc ARM_SOFTFP_ABI=1", the result is correct. (hard float mode can not work in this branch too...) but when i tried "master" branch, and use similar compile command(make CC=arm-none-linux-gnueabi-gcc TARGET=ARMV7 NOFORTRAN=1 HOSTCC=gcc NO_LAPACK=1 ONLY_CBLAS=1,what ever using hard/softfp/soft mode), the result is strange(sometimes all zero, sometimes like 1.4013e-45 and so on).

Apr 10 '17 01:04 gangm

"So the issue I believe, is related to your ARMv7 setup, and not OpenBLAS." what did "related to your ARMv7 setup" mean? you means our hardware setup or arm cross compile envionment?

I meant your hardware setup. Could you please share the output of /proc/cpuinfo of your ARMv7 machine.

Apr 10 '17 04:04 ashwinyes

hello:

@ashwinyes ,cpuinfo is as below:

Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 1988.29

processor : 1 BogoMIPS : 1988.29

processor : 2 BogoMIPS : 1988.29

processor : 3 BogoMIPS : 1988.29

Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10

Hardware : Freescale i.MX 6Quad/DualLite/Solo Sabre-SD Board Revision : 63015 Serial : 0c17a1d4e6b573b3

Apr 10 '17 05:04 gangm

@gangm Thanks for sharing the cpuinfo. Wanted to check that your processor actually supports vfpv3.

Now, another thing to check would be that all libraries being used (including boost, caffe, pthread etc. are confirming to the "-mfloat-abi=hard". You may use the steps mentioned in [http://stackoverflow.com/questions/20555594/how-can-i-know-if-an-arm-library-is-using-hardfp] to check it.

OR

You can try building your standalone program without using any other library except OpenBLAS.

Apr 10 '17 05:04 ashwinyes

And googling further, I found the following issues which also looks related to the issue at hand here.

https://github.com/sh1r0/caffe-android-lib/issues/27 https://github.com/sh1r0/caffe-android-lib/issues/37 https://github.com/xianyi/OpenBLAS/issues/777

@xianyi will be right person to comment on the extent of softfp support in the latest OpenBLAS code.

Apr 10 '17 06:04 ashwinyes

@ashwinyes thanks for your reply.

I tried using standalone program which just using OpenBLAS library, and I can see it is support vfpv3, as below: Attribute Section: aeabi File Attributes Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Application Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3 Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_align_preserved: 8-byte, except leaf SP Tag_ABI_enum_size: int Tag_ABI_HardFP_use: SP and DP Tag_ABI_VFP_args: VFP registers Tag_CPU_unaligned_access: v6 Tag_DIV_use: Not allowed

and the result is also strange, as below: 1.34409e+38 1.34409e+38 1.34409e+38 1.34409e+38 1.34409e+38 1.34409e+38 1.34409e+38 1.34409e+38

@xianyi , do you know whether our OpenBLAS is support ARMv7 in hard float mode? I tried many branches and many methods, but seems no work. how much will the performance be promoted in hard float mode compared with softfp mode?(we can use softfp mode instead, but the performance is a little slow.)

Apr 10 '17 06:04 gangm

@gangm Can you give the "readelf" output for your pthread and boost libraries as well ?

Apr 10 '17 07:04 ashwinyes

@ashwinyes readelf of pthread is as below: File Attributes Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Application Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3 Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_align_preserved: 8-byte, except leaf SP Tag_ABI_enum_size: int Tag_ABI_HardFP_use: SP and DP Tag_ABI_optimization_goals: Aggressive Speed Tag_DIV_use: Not allowed

Apr 10 '17 11:04 gangm

@gangm I encountered the same problem. More info:

Built all with armeabi-v7a-hard with NEON, got wrong result just as your case.
Built others with armeabi-v7a with NEON, change OpenBLAS build script:

sed -i -e 's/float-abi=hard/float-abi=softfp/g' Makefile.arm

Caffe load model worked, run Forward() crashed. I guess the problem is in OpenBLAS (forwarding use blas, loading not).

Update

Branch https://github.com/xianyi/OpenBLAS/tree/arm_soft_fp_abi works for armeabi-v7a with NEON, ignore my comments 2.

Apr 28 '17 04:04 solrex

Please note that the arm_soft_fp_abi branch is a work in progress, only a handful of functions have been modified for the softfp abi so far.

Apr 28 '17 09:04 martin-frbg

Hello!

I compiled openblas for Android, linked with the specified flags to avoid issues with hard float. I have the same issue, and also other issues reported by others on armv7 with hard float on Android (some functions returning zero or other incorrect values, or not returning at all). (they work fine on all other architectures I tried including armv5. arm64 (armv8), x86. I did not open another issue as I found plenty open ones that mention the same issue, including: #777, #853, #894, #1088.

Do we know anything about the cause of these issues?

I understand that adding soft float support is in progress (as mentioned in this thread also) but is done for only a handful of functions on another branch. Are there other ways around this issue? One workaround would be to just use the armv5 libraries, which works fine an armv7 also, but I did some benchmarking and found it to be around 50 times slower when doing certain things like multiplying big float or double matrices, which is pretty much expected.

May 19 '17 14:05 scorpeeon

@gangm @scorpeeon @martin-frbg I also use the "arm_soft_fp_abi" branch, and use compile command:"make TARGET=ARMV7 NOFORTRAN=1 HOSTCC=gcc ARM_SOFTFP_ABI=1", the result is correct.

but when i use the hard float flag ,it can compiled successfully,but test the cblas_sgemm(),it can not work normly,the error is segmentation fault,i think the reason is assembly code.because i use the c code ,it can work ok.

Jun 20 '17 06:06 ctgushiwei

@ctgushiwei are you some idea about my questions: https://stackoverflow.com/questions/57534249/compile-the-flutter-engine-with-hard-float-type-library

Thanks for you help in advance!

Aug 17 '19 07:08 ping996

OpenBLAS OpenBLAS copied to clipboard

ARMV7 (with hard float flag) did not run with correct result

OpenBLAS
OpenBLAS copied to clipboard