Tri Dao
Tri Dao
https://github.com/Dao-AILab/flash-attention/blob/b517a592049ed81a4cf9ad3aa4b4a7372e9d9a56/flash_attn/cute/flash_fwd_sm100.py
> Thanks! Sorry this is a stupid question. > > But to use it on b200s, what would i have to do? I followed this: > > ``` > cd...
I'm hearing aarch64 wheels will be coming soon (on the order of weeks).
Please look at existing issues on numerical error. The right thing to compare is (flashattn in fp16 - reference attn in fp32) vs (reference attn in fp16 - reference attn...
https://pytorch.org/tutorials/recipes/recipes/benchmark.html
It's a beta release for now, we're doing more extensive testing before including it in the wheels.
Not yet. PRs are welcome.
[Triton tutorials](https://triton-lang.org/main/getting-started/tutorials/index.html) are a good place to start to learn about how tensors are laid out in memory, and how to read & write to them. After that you can...
Can you say what steps are required to reproduce this?
Probably. You can search github issues to see