cutlass
cutlass copied to clipboard
Add Faster Neighborhood Attention to PUBLICATIONS
Adds "Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level" to publications.
TLDR; Neighborhood attention requires treating the attention problem as two batched GETTs instead of GEMMs (row mode can be 2-D and 3-D instead of a single dimension as in NLP.)
This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.