oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

brgemm_matmul segfaults with multiple threads when broadcasting dims

Open Sqvid opened this issue 1 month ago • 1 comments

Summary

brgemm_matmul segfaults with multiple threads when broadcasting dims.

Version

main: 976bf2d4eb61582c1655e69208ff8173a93d8b45

Environment

oneDNN includes hardware-specific optimizations and may behave differently on depending on the compiler and build environment. Include the following information to help reproduce the issue:

  • CPU: x64 and AArch64
  • OS version: Linux 6.14
  • git hash: 976bf2d4eb61582c1655e69208ff8173a93d8b45

Steps to reproduce

On x64:

$ ONEDNN_VERBOSE=profile_create,profile_exec OMP_NUM_THREADS=2 ./build/tests/benchdnn/benchdnn --matmul --mode=R --stag=abcd --dtag=abcd 2x1x40x20:1x1x20x40
onednn_verbose,v1,info,oneDNN v3.11.0 (commit 976bf2d4eb61582c1655e69208ff8173a93d8b45)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:2
onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,v1,info,gpu,runtime:none
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,primitive,create:cache_miss,cpu,matmul,brg_matmul:avx512_core,undef,src:f32::blocked:abcd::f0 wei:f32:a:blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,2x1x40x20:1x1x20x40,0.2771
onednn_verbose,v1,primitive,create:cache_hit,cpu,matmul,brg_matmul:avx512_core,undef,src:f32::blocked:abcd::f0 wei:f32:a:blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,2x1x40x20:1x1x20x40,0.00195312
onednn_verbose,v1,primitive,exec,cpu,matmul,brg_matmul:avx512_core,undef,src:f32::blocked:abcd::f0 wei:f32:a:blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,2x1x40x20:1x1x20x40,0.194824
0:EXECUTED (1 ms) __REPRO: --mode=R --mode-modifier=M --matmul --stag=abcd --dtag=abcd 2x1x40x20:1x1x20x40
============================================================
= Implementation statistics (--summary=no-impl to disable) =
============================================================
| brg_matmul:avx512_core : 1 (100%)                        |
============================================================
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.00s; create_pd: 0.00s (30%); create_prim: 0.00s (38%); fill: 0.00s (0%); execute: 0.00s (14%);
Segmentation fault (core dumped)

Observed behavior

Segmentation fault.

Expected behavior

I would strongly prefer if it did not segfault.

Triage

This bug is common to the x64 and AArch64 paths. I have done the triage on the AArch64 end but I suspect it is the same bug.

Essentially we calculate the batch address here https://github.com/uxlfoundation/oneDNN/blob/6fd57103715166bd59bf1fd6989003e61e201bf0/src/cpu/aarch64/matmul/brgemm_matmul.cpp#L368

And that calculation depends on the thread number https://github.com/uxlfoundation/oneDNN/blob/6fd57103715166bd59bf1fd6989003e61e201bf0/src/cpu/aarch64/matmul/brgemm_matmul.cpp#L1012-L1015

Which means that when broadcasting the following points to garbage: https://github.com/uxlfoundation/oneDNN/blob/77dfcef253f65be5403d893e947906858bf5b6bb/src/cpu/aarch64/brgemm/brgemm_types.hpp#L102

And therefore segfaults when it is later accessed in the kernel (at execute time): https://github.com/uxlfoundation/oneDNN/blob/6fd57103715166bd59bf1fd6989003e61e201bf0/src/cpu/aarch64/brgemm/jit_brgemm_kernel.cpp#L1451-L1456

On the AArch64-path the acl_matmul implementation picks up this shape and therefore does not crash but the bug is still present. x64 crashes out-of-the-box.

I'd greatly appreciate advice on how to approach the fix (probably just adding a broadcast branch to the batch pointer calculation?).

@dzarukin @vpirogov

Sqvid avatar Nov 28 '25 13:11 Sqvid

@densamoilov

dzarukin avatar Dec 01 '25 18:12 dzarukin