iree
iree copied to clipboard
Shared memory failure on softmax
When compiling the following IR: https://gist.github.com/rsuderman/c2ff931ca5ddad20f061f31f7a8847a2
We see a shared memory failure: https://gist.github.com/rsuderman/318fb1db5735d9d311060dfeebdf2cfd
@pashu123 can you take a look. cc @Groverkss
Using the command: iree-compile test_softmax.mlir --iree-hip-target=gfx942 -o=abc.vmfb -iree-opt-level=O3 --iree-hal-target-device=hip. The problematic dispatch is https://gist.github.com/pashu123/74778fcd0526039861913d503e5b8e84
I see a gather followed by a softmax.
%13 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%cst, %10 : tensor<4xi64>, tensor<4xi64>) outs(%12 : tensor<4x128256xf32>) {
^bb0(%in: i64, %in_0: i64, %out: f32):
%16 = arith.subi %in_0, %c1_i64 : i64
%17 = linalg.index 1 : index
%18 = arith.cmpi slt, %in, %c0_i64 : i64
%19 = arith.addi %in, %c4_i64 : i64
%20 = arith.select %18, %19, %in : i64
%21 = arith.index_cast %20 : i64 to index
%22 = arith.cmpi slt, %16, %c0_i64 : i64
%23 = arith.index_castui %7 : index to i64
%24 = arith.addi %16, %23 : i64
%25 = arith.select %22, %24, %16 : i64
%26 = arith.index_cast %25 : i64 to index
%extracted = tensor.extract %9[%21, %26, %17] : tensor<4x?x128256xf16>
%27 = arith.extf %extracted : f16 to f32
linalg.yield %27 : f32
} -> tensor<4x128256xf32>
%14 = linalg.softmax dimension(1) ins(%13 : tensor<4x128256xf32>) outs(%12 : tensor<4x128256xf32>) -> tensor<4x128256xf32>
Raised the patch here for the fix: https://github.com/iree-org/iree/pull/21117