Cream
Cream copied to clipboard
Rethinking and Improving Relative Position Encoding for Vision Transformer with memory optimized attentions
Hello I was wondering whether your relative positional encoding schemes would work with approximate attention mechanisms for example like presented in flash attention https://arxiv.org/abs/2205.14135
Thanks for your attention to our work!
Let me read the paper and check whether RPE works with approximate attention mechanisms.