TransnormerLLM icon indicating copy to clipboard operation
TransnormerLLM copied to clipboard

Bugs in Triton operator?

Open XintianHan opened this issue 1 year ago • 1 comments

Hi. Thanks for the nice triton implementation. Maybe I found a bug in the triton operator. It seems that the operator does not support head dim=192, but it supports dim=128 and 256.

For the example below

from lightning_attention import lightning_attention
import torch
# b h n d
b = 1
h = 16
n = 64
d = 192
q = torch.randn(b, h, n, d).to("cuda")
k = torch.randn(b, h, n, d).to("cuda")
v = torch.randn(b, h, n, d).to("cuda")
slope_rate = torch.ones(h).to("cuda")
output = lightning_attention(
    q, k, v, True, slope_rate.squeeze(-1).squeeze(-1)
)
print("test succeed!")

It gives me the error

  File "<string>", line 41, in _fwd_kernel
  File "/home/tiger/.local/lib/python3.9/site-packages/triton/compiler.py", line 1621, in compile
    next_module = compile(module)
  File "/home/tiger/.local/lib/python3.9/site-packages/triton/compiler.py", line 1550, in <lambda>
    lambda src: ast_to_ttir(src, signature, configs[0], constants)),
  File "/home/tiger/.local/lib/python3.9/site-packages/triton/compiler.py", line 963, in ast_to_ttir
    return optimize_triton_ir(mod)
  File "/home/tiger/.local/lib/python3.9/site-packages/triton/compiler.py", line 957, in optimize_triton_ir
    pm.run(mod)
RuntimeError: PassManager::run failed

On

line 370, in forward
    _fwd_kernel[grid](

Any advice here?

XintianHan avatar Jan 25 '24 12:01 XintianHan

Thank you for your feedback, but I believe this issue would be more appropriately raised at https://github.com/OpenNLPLab/lightning-attention. Could you please open the same issue in the lightning-attention repository? I will follow up on the issue there. We already have a version that supports larger head dimensions locally, and it will be updated over the weekend.

Doraemonzzz avatar Jan 25 '24 13:01 Doraemonzzz