ZLUDA
ZLUDA copied to clipboard
Can ZLUDA compile and run FlashAttention?
FlashAttention is an IO-aware, highly optimized implementation of the Attention mechanism for Nvidia GPUs. Since it's implemented in CUDA, it should in principle also compile and run with ZLUDA, right? I could imagine that some required CUDA operations are not implemented yet or differences in hardware architecture would result in lower throughput, but apart from this, could it generally run on AMD hardware with reasonable performance?
i just found the repo few days ago, havent tested yet but seems very promising. I hope it does, theorically it should