Adds feature pyramid attention (FPA) module, resolves #167
For #167.
Adds Feature Pyramid Attention (FPA) module :boom: :rocket:
Pyramid Attention Network for Semantic Segmentation https://arxiv.org/abs/1805.10180

from https://arxiv.org/abs/1805.10180 Figure 2

from https://arxiv.org/abs/1805.10180 Figure 3
Tasks
- [x] add after encoder and before decoder
- [ ] benchmark with and without fpa module
- [ ] experiment with the paper's GAU modules to replace our decoder
- [ ] experiment with scse in our fpn #75
@ocourtin maybe this is interesting to you :)
By now we have https://arxiv.org/abs/1904.11492 which not only compares various attention mechanisms but also comes up with a framework for visual attention and proposal a new global context block in this visual attention framework.
I've implemented
- Self-attention (as in SAGAN, BIGGAN, etc.)
- Simple self-attention (see paper above)
- Global Context block (see paper above)
for my 3d video models in https://github.com/moabitcoin/ig65m-pytorch/blob/706c9e737e42d98086b3af24548fb2bb6a7dc409/ig65m/attention.py#L9-L103
for the 2d segmentation case here we can adapt the 3d code and then e.g. use a couple of global context blocks on top of the last (high level) resnet feature blocks.
from https://arxiv.org/abs/1904.11492