Is there a way to turn off the setting to use flash attention/triton library?
I'm trying to run the train script on one GUE dataset. I'm using a Google colab notebook and the transformers 4.29 library version. I've been stuck for several hours on this error with the triton library. I tried Using PIP to install triton version 2.0, 2.3 .1, and 3.0 but they all gave the errors similar to below. TypeError: dot() got an unexpected keyword argument 'trans_b'
CompilationError: at 114:14:
else:
if EVEN_HEADDIM:
k = tl.load(k_ptrs + start_n * stride_kn,
mask=(start_n + offs_n)[:, None] < seqlen_k,
other=0.0)
else:
k = tl.load(k_ptrs + start_n * stride_kn,
mask=((start_n + offs_n)[:, None] < seqlen_k) &
(offs_d[None, :] < headdim),
other=0.0)
qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
qk += tl.dot(q, k, trans_b=True)
^
I think the best way to approach it now would be disable the flash attention, so the model won't have to use any code involving triton. Some people have reported success by just uninstalling the library.
I then tried installing version 1.1.1 from source then simply uninstalling it but that didn't work.
Is there a way to turn off the setting to try to use flash attention?
I had the same issue. First I just installed triton 2.0.0, then pip install triton 3.0.0, then I tried following the instruction to install triton in editable mode, all led to the same issue. After I pip uninstall triton, it starts working.
Thanks a lot, it also works for my case :)
Just uninstall triton or install triton>=3.0.0, the model will automaticlly running without triton