Tri Dao
Tri Dao
Does `nvcc -V` run?
As mentioned in the README, we require CUDA (and nvcc) >= 11.6.
Thanks for this contribution! What happens if user calls a function that's not currently supported (e.g. paged KV or varlen)?
No mistake
12.3 will work
cuda minor version are compatible
@rocking5566 does the AMD version support alibi?
I personally have no bandwidth for that, so we'd need folks to contribute.
What's the difference? The right comparison is (flashattn in fp16 - reference implementation in fp32) vs (rerefnece implementation in fp16 - reference in fp32)
No that's not implemented (one would have to change the backward pass code to compute the gradient of the slopes).