Tri Dao

Results 438 comments of Tri Dao
trafficstars

https://github.com/Dao-AILab/flash-attention/blob/b517a592049ed81a4cf9ad3aa4b4a7372e9d9a56/flash_attn/cute/flash_fwd_sm100.py

> Thanks! Sorry this is a stupid question. > > But to use it on b200s, what would i have to do? I followed this: > > ``` > cd...

I'm hearing aarch64 wheels will be coming soon (on the order of weeks).

Please look at existing issues on numerical error. The right thing to compare is (flashattn in fp16 - reference attn in fp32) vs (reference attn in fp16 - reference attn...

It's a beta release for now, we're doing more extensive testing before including it in the wheels.

[Triton tutorials](https://triton-lang.org/main/getting-started/tutorials/index.html) are a good place to start to learn about how tensors are laid out in memory, and how to read & write to them. After that you can...

Can you say what steps are required to reproduce this?