cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

Add Faster Neighborhood Attention to PUBLICATIONS

Open alihassanijr opened this issue 1 year ago • 1 comments

Adds "Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level" to publications.

TLDR; Neighborhood attention requires treating the attention problem as two batched GETTs instead of GEMMs (row mode can be 2-D and 3-D instead of a single dimension as in NLP.)

alihassanijr avatar Apr 11 '24 01:04 alihassanijr

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar May 11 '24 01:05 github-actions[bot]