triton icon indicating copy to clipboard operation
triton copied to clipboard

CSE and LICM don't work as expected with exp in the loop

Open Li-dongyang opened this issue 1 year ago • 2 comments

I noticed that

CSE and LICM don't work as expected with exp in the loop

is mentioned in /python/triton/ops/flash_attention.py (credits to Adam P. Goucher @apgoucher )

Can someone explain to me the reason for saying this? Has this problem been solved? Thank you so much.

https://github.com/openai/triton/blob/e2bdc8973feb41fc60d31472bdbe3b80c3ad8405/python/triton/ops/flash_attention.py#L59-L63

Li-dongyang avatar Jan 18 '24 11:01 Li-dongyang

This may be an issue with the upstream MLIR, I will investigate first.

lipracer avatar Jan 24 '24 03:01 lipracer

I printed out mlir and found that the exp operation will be constructed in this form%146 = tt.extern_elementwise %145 {libname = "", libpath = "", pure = true, symbol = "__nv_expf"} : (tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #mma}>>) -> tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #mma}>> loc(#loc36). Why not use the mlir.math dialect here? And I found that exp will be converted to exp2 in convert-triton-gpu-to-llvmpass. I don’t know much about this. Context, if we build math.exp directly and then convert it to exp2, this will not prevent the compiler optimization of mlir.

lipracer avatar Jan 27 '24 17:01 lipracer