Attention-Augmented-Conv2d
Attention-Augmented-Conv2d copied to clipboard
Memory/Time Complexity of the relative positional encoding
Thanks for your project.
I have some questions about the implementation of the relative positional encoding. According to your implementation, the memory cost is O((H^2W^2) while the paper mentions that they optimize the memory cost to O(HW).
Besides, I have also tried your method on the semantic segmentation tasks and find it is very slow and consumes a huge amount of memory.
I am wondering whether you have improved memory and time issues.
Thanks for your comment !
- memory cost
- When I think that the conventional relative position encoding is O (H ^ 2W ^ 2) because it generates HxW matrix. However, the current code is O (HW) because it generates a 1d vector of H and W.
- Time issues
- I'll fix it as soon as possible. Thank you !