dlrover
dlrover copied to clipboard
How to use Flash Attention x TFPlus?
Flash Attention x TFPlus is a nice work, and how to use it? Could you please provide an example or unit test about Flash Attention?
You can refer to this commit to use MultiHeadAttention like in TensorFlow. https://github.com/intelligent-machine-learning/dlrover/pull/850