Varuna Jayasiri
Results
2
issues of
Varuna Jayasiri
Remove in-place add of eps. Use in-place div for improved performance
In the flash attention example, keep the max of previous scores_max and max(acc_s) in scores_max for numerical stability From [Flash Attention 2 paper](https://arxiv.org/pdf/2205.14135), Algorithm 1 $$m_i^{\text{new}} = \max(m_i, \tilde{m}_{ij})$$ ##...