Assertion failure in brgemm in debug build on G3 aarch64 machine
Summary
When running ctest -R cpu-tutorials-matmul-matmul-quantization-cpp with a debug build on a G3 aarch64 machine, an assertion failure can be seen in brgemm_matmul_utils.cpp.
Version
v3.6.0 (commit fbb277db5a9c67fc646264f8ff68eb6f24f411fb)
Environment
oneDNN includes hardware-specific optimizations and may behave differently on depending on the compiler and build environment. Include the following information to help reproduce the issue:
- CPU make and model - aarch64
- OS version - 22.04.1-Ubuntu
- Compiler version - 11.4.0
- CMake version 3.22.1
- CMake output log
- git hash - fbb277db5a9c67fc646264f8ff68eb6f24f411fb
Steps to reproduce
- Build in debug mode
- Run
ctest -R cpu-tutorials-matmul-matmul-quantization-cpp
Observed behavior
Test fails with following message:
cpu-tutorials-matmul-matmul-quantization-cpp: /home/ubuntu/oneDNN/src/cpu/aarch64/matmul/brgemm_matmul_utils.cpp:133: dnnl::impl::status_t dnnl::impl::cpu::aarch64::matmul::check_isa_with_datatype(dnnl::impl::cpu::aarch64::cpu_isa_t, const dnnl::impl::cpu::aarch64::matmul::brgemm_matmul_conf_utils_t&): Assertion `bm_conf_utils.is_f32()' failed.
Expected behavior
Test passes
@jondea, @cfRod This is aarch64 platform specific issue. Can you please have a look into this? Thanks.
Hi @rupakroyintel, this appears to be an issue from the JIT'ed path on AArch64. Linking @vineelabhinav as the author of the following PR https://github.com/oneapi-src/oneDNN/pull/1815/files that added brgemm.
@vineelabhinav Can you please look into this issue? Thanks.
Hi @Ryo-not-rio @cfRod @rupakroyintel , This is expected behaviour from JIT side. • We have implemented brgemm for only f32 data type. • When oneDNN is built in Release mode, it skips our implementation and goes to the reference implementation and executes that. Therefore the test does not fail. • But in Debug mode, oneDNN tries to implement our brgemm and if it fails, it stops there and does not go to reference implementation. Therefore the test does not pass in Debug mode. We have intentionally added this assertion so that it falls back to reference implementation when f32 data type is not used.
As far as possible, behavior in debug mode should match release mode, and assertions should not be expected in the test suite. Sorry if I've misunderstood, but is there a way to use normal flow control to fall back to reference even in debug mode rather than assert?
Bumping this issue as it was very frustrating for me to discover this bug. It stops development in debug mode, for a while I thought it was an issue in the work I was doing, wasting quite a bit of time, until I found this issue. I agree with @jondea that logical behaviour in debug mode should match release mode.
Hi, I have fixed this issue in the PR : https://github.com/oneapi-src/oneDNN/pull/1985 Please have a look.