dnnl_aarch64 icon indicating copy to clipboard operation
dnnl_aarch64 copied to clipboard

Questions about __ARM_ARCH

Open chauncyyoung opened this issue 5 years ago • 7 comments

I have two questions below:

First, when I used the extended_sgemm, I found it went into __ARM_ARCH acquiescently. But I can not find the place that it was defined. Could you help me solve this problem?

Second, I tried to use jit_avx512_common_gemm_f32 but was failed because of a undefined references ocuured in libmkldnn. Should I adjust other parameters to run it?

  • OS version: aarch64 GNU/Linux
  • Compiler version gcc (Ubuntu/Linaro 5.4.0-6kord1~16.04.12) 5.4.0 20160609
  • MKLROOT value (echo MKLROOT=$MKLROOT)

#ifdef __ARM_ARCH // return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); //else // #ifdef __ARM_ARCH if (mayiuse(avx512_mic)) { printf("enter 1\n"); return jit_avx512_common_gemm_f32(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); } else if (mayiuse(avx)) { printf("enter 2\n"); float *dummy_ao = NULL; float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    printf("enter 3\n");
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH

chauncyyoung avatar Jun 15 '20 14:06 chauncyyoung

Hi chauncyyoung

Regarding the first question: __ARM_ARCH is a predefined macro for specifying architecture in compilers.

Regarding the second question: Are you facing build error? On our environment, the error does not occur. Did you modify source codes other than the above?

Takumi-Honda avatar Jun 16 '20 09:06 Takumi-Honda

Hi chauncyyoung

Regarding the first question: __ARM_ARCH is a predefined macro for specifying architecture in compilers.

Regarding the second question: Are you facing build error? On our environment, the error does not occur. Did you modify source codes other than the above?

Thank you for your reply. I just tried to ignore the case of ref_gemm by adding "//" before it just like follows:

#ifdef __ARM_ARCH // return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); //#else // #ifdef __ARM_ARCH if (mayiuse(avx512_mic)) { return jit_avx512_common_gemm_f32(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); } else if (mayiuse(avx)) { float *dummy_ao = NULL; float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH

And then the errors happened when built. Other source codes hadn't been modified. So I don't know whether I should modify the cmake files. I also found this case has used kernel_table[isTransA][isTransB][hasBias][beta_idx(beta)] = new xbyak_gemm(isTransA, isTransB, beta, hasBias); in ./jit_avx512_common_gemm_f32.cpp while it used AVX512 in xbyak_gemm such as 'vgatherqps'(I'm not sure because I'm a newcomer...) If it's true, would it be transfered by xbyak to the Arm Assembly? Thank you again for helping me!!! :)

chauncyyoung avatar Jun 16 '20 11:06 chauncyyoung

Hi chauncyyoung-san

Thank you for trying dnnl_aarch64.

I tried your procedure.

  • cloned dnnl_aarch64
  • modified gemm.cpp (#ifdef __ARM_ARCH -> #ifdef __ARM_ARCH_)
  • and finally execute cmake and make
#ifndef __ARM_ARCH_
    if (mayiuse(avx512_mic)) {
        return jit_avx512_common_gemm_f32(transa, transb,
                M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
    } else if (mayiuse(avx)) {
        float *dummy_ao = NULL;
        float *dummy_bo = NULL;

        return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
                A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
                force_jit_nocopy_gemm);
    } else
#endif // __ARM_ARCH                                                                                                                                                                                                                                                                                            
    {
        return ref_gemm<float>(transa, transb,
                M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
    }

All binary are successfully built in my environment, but ./test_gemm_f32 becomes SEGV in 14-th test pattern. I'll try to bug fix.

[==========] Running 21 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 21 tests from TestGEMM_fp32/gemm_test
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/0
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/0 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/1
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/1 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/2
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/2 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/3
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/3 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/4
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/4 (3296 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/5
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/5 (9 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/6
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/6 (11 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/7
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/7 (12 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/8
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/8 (22 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/9
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/9 (9 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/10
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/10 (8 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/11
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/11 (6 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/12
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/12 (1 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/13
zsh: segmentation fault (core dumped)  ./test_gemm_f32

kawakami-k avatar Jun 16 '20 12:06 kawakami-k

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

kawakami-k avatar Jun 16 '20 13:06 kawakami-k

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

Thank you for your reply, I think it may be influenced by version of dnnl_aarch64. I used the branch of release_base_0.19. I'll try the latest version with QEMU later.

chauncyyoung avatar Jun 22 '20 03:06 chauncyyoung

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

I'm also not sure about xbyak whether it translates x86 assembler to aarch64 assembler?

Another question occurs in gemm.cpp as follows:

#ifdef __ARM_ARCH return ref_gemm( transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); #else // #ifdef __ARM_ARCH if (mayiuse(avx512_mic)) { return jit_avx512_common_gemm_f32(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias); } else if (mayiuse(avx)) { float *dummy_ao = NULL; float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH }

As ref_gemm is before jit_avx512_common_gemm_f32, does ref_gemm has a higher priority? Or in another words, does ref_gemm has a better performance than jit_avx512_common_gemm_f32?

chauncyyoung avatar Jun 24 '20 02:06 chauncyyoung

chauncyyoung-san

"release_base_0.19" does not output any JIT-ed code except jit_uni_reorder.cpp so that ref_gemm is always used for AArch64.

Please use "release_base_0.21" to try various JIT-ed code on AArch64. This version generates some JIT-ed code directly by using Xbyak_aarch64. It is implemented src/cpu/jit_sve_*.cpp. And this version also outputs some JIT-ed code indirectly by using Xbyak_translator_aarch64, which translates x86 JIT-ed instructions to AArch64 instructions one by one.

If you want to try JIT-ed gemm, replace

#ifndef __ARM_ARCH

of https://github.com/fujitsu/dnnl_aarch64/blob/release_base_0.21/src/cpu/gemm/gemm.cpp#L123 to

#ifdef __ARM_ARCH

Currently, "release_base_0.21" has some bugs in JIT-ed gemm, it is disabled by default.

kawakami-k avatar Jun 24 '20 07:06 kawakami-k