BottleneckTransformers
BottleneckTransformers copied to clipboard
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
Update 2021/03/14
- support Multi-head Attention
Experiments
Model | heads | Params (M) | Acc (%) |
---|---|---|---|
ResNet50 baseline (ref) | 23.5M | 93.62 | |
BoTNet-50 | 1 | 18.8M | 95.11% |
BoTNet-50 | 4 | 18.8M | 95.78% |
BoTNet-S1-50 | 1 | 18.8M | 95.67% |
BoTNet-S1-59 | 1 | 27.5M | 95.98% |
BoTNet-S1-77 | 1 | 44.9M | wip |
Summary

Usage (example)
- Model
from model import Model
model = ResNet50(num_classes=1000, resolution=(224, 224))
x = torch.randn([2, 3, 224, 224])
print(model(x).size())
- Module
from model import MHSA
resolution = 14
mhsa = MHSA(planes, width=resolution, height=resolution)
Reference
- Paper link
- Author: Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
- Organization: UC Berkeley, Google Research