robosat
robosat copied to clipboard
Adds squeeze and excitation (scSE) modules, resolves #157
For https://github.com/mapbox/robosat/issues/157.
Adds scSE modules :boom: :rocket:
https://arxiv.org/abs/1709.01507
Squeeze-and-Excitation Networks
https://arxiv.org/abs/1803.02579
Concurrent Spatial and Channel 'Squeeze & Excitation' in Fully Convolutional Networks
from https://arxiv.org/abs/1803.02579
Tasks
- [x] add to encoder and decoder
- [x] benchmark with and without scse modules
- [ ] experiment with scse in our fpn https://github.com/mapbox/robosat/pull/75
@ocourtin maybe this is interesting to you :)
Just added the scSE modules to our encoders and decoders following the paper recommendation.
Let's see if this thing goes :rocket:
i would like daudi karanja indigenous land map protected area in kenya
What I'm seeing in benchmarks so far is consistent better performance (+ 4-6 pct points) for an incredible small increased computational cost. I will run some more benchmarks over the next days but if nothing wild happens Iit'd be best to get this in. Fascinating results, love it!
@ocourtin maybe you want to give it a try, too, if you have the time and dataset for this to benchmark it.
Also, what a great name :ok_hand:
@daniel-j-h Thanks for this !
I gave a quick try (with robosat.pink), and for now, not yet able to see significant improvement (from metrics), with scSE stuff.
Will try harder...
@ocourtin did you find the time to try again this branch? I'm seeing improvements from the scSE blocks at almost no cost when training on my large datasets. Would be great if we can confirm this otherwise I'm hesitant to just merge it in.
By now we have https://arxiv.org/abs/1904.11492 which not only compares various attention mechanisms but also comes up with a framework for visual attention and proposal a new global context block in this visual attention framework.
I've implemented
- Self-attention (as in SAGAN, BIGGAN, etc.)
- Simple self-attention (see paper above)
- Global Context block (see paper above)
for my 3d video models in https://github.com/moabitcoin/ig65m-pytorch/blob/706c9e737e42d98086b3af24548fb2bb6a7dc409/ig65m/attention.py#L9-L103
for the 2d segmentation case here we can adapt the 3d code and then e.g. use a couple of global context blocks on top of the last (high level) resnet feature blocks.
from https://arxiv.org/abs/1904.11492