Rethinking and Improving Relative Position Encoding for Vision Transformer with memory optimized attentions

Open jakubMitura14 opened this issue 3 years ago • 1 comments

Hello I was wondering whether your relative positional encoding schemes would work with approximate attention mechanisms for example like presented in flash attention https://arxiv.org/abs/2205.14135

Dec 29 '22 10:12 jakubMitura14

Thanks for your attention to our work!

Let me read the paper and check whether RPE works with approximate attention mechanisms.

Dec 29 '22 13:12 wkcn