ZLUDA icon indicating copy to clipboard operation
ZLUDA copied to clipboard

Can ZLUDA compile and run FlashAttention?

Open kklemon opened this issue 1 year ago • 1 comments

FlashAttention is an IO-aware, highly optimized implementation of the Attention mechanism for Nvidia GPUs. Since it's implemented in CUDA, it should in principle also compile and run with ZLUDA, right? I could imagine that some required CUDA operations are not implemented yet or differences in hardware architecture would result in lower throughput, but apart from this, could it generally run on AMD hardware with reasonable performance?

kklemon avatar Feb 13 '24 13:02 kklemon

i just found the repo few days ago, havent tested yet but seems very promising. I hope it does, theorically it should

userbox020 avatar Feb 13 '24 21:02 userbox020