AMDMIGraphX
AMDMIGraphX copied to clipboard
Softmax JIT kernel: large number of gpu work-items created for certain shapes.
Problem Description
For softmax operator (axis=2), shape float_type, {512, 4, 1067, 6}, the SoftMax JIT-kernel is deployed with: Global = 139853824 (512 x 4 x 1067 x 64), Local = 64,. This kernel could be optimized better to utilize more of 64 lanes.
Operating System
Ubuntu 22.04.4 LTS
CPU
Intel Xeon Platinum 8480C
GPU
AMD Instinct MI300
Other
No response
ROCm Version
ROCm 6.0.0
Steps to Reproduce
bin/verify test_softmax*
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response