triton CSE and LICM don't work as expected with exp in the loop

CSE and LICM don't work as expected with exp in the loop

Open Li-dongyang opened this issue 1 year ago • 2 comments

I noticed that

CSE and LICM don't work as expected with exp in the loop

is mentioned in /python/triton/ops/flash_attention.py (credits to Adam P. Goucher @apgoucher )

Can someone explain to me the reason for saying this? Has this problem been solved? Thank you so much.

https://github.com/openai/triton/blob/e2bdc8973feb41fc60d31472bdbe3b80c3ad8405/python/triton/ops/flash_attention.py#L59-L63

Jan 18 '24 11:01 Li-dongyang

This may be an issue with the upstream MLIR, I will investigate first.

Jan 24 '24 03:01 lipracer

I printed out mlir and found that the exp operation will be constructed in this form%146 = tt.extern_elementwise %145 {libname = "", libpath = "", pure = true, symbol = "__nv_expf"} : (tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #mma}>>) -> tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #mma}>> loc(#loc36). Why not use the mlir.math dialect here? And I found that exp will be converted to exp2 in convert-triton-gpu-to-llvmpass. I don’t know much about this. Context, if we build math.exp directly and then convert it to exp2, this will not prevent the compiler optimization of mlir.

Jan 27 '24 17:01 lipracer

triton triton copied to clipboard

CSE and LICM don't work as expected with exp in the loop

triton
triton copied to clipboard