pytorch-image-models
pytorch-image-models copied to clipboard
Improve convnext v1 amp speed.
Modification
- Add an option to fuse layerscale into last linear in Mlp. Less elementwise operations improves amp train/infer speed.
- Reshape x for Mlp, which slightly improves speed.
Together with the fast_norm option, convnext train/infer is much faster.
Benchmark
RTX3090, pt112-cu113, apex not installed
python benchmark.py --model convnext_tiny --img-size 224 --amp
conv_mlp=False
| fast_norm | reshape_x | fast_layerscale | infer_samples_per_sec | train_samples_per_sec | infer relative | train relative |
|---|---|---|---|---|---|---|
| N | N | N | 2208.06 | 793.61 | 1 | 1 |
| Y | N | N | 2485.99 | 858.76 | 1.12587 | 1.08209 |
| N | Y | N | 2320.33 | 806.36 | 1.05085 | 1.01607 |
| N | N | Y | 2381.74 | 867.64 | 1.07866 | 1.09328 |
| Y | Y | N | 2623.06 | 872.98 | 1.18795 | 1.10001 |
| Y | N | Y | 2816.98 | 980.99 | 1.27577 | 1.23611 |
| N | Y | Y | 2514.63 | 883.65 | 1.13884 | 1.11346 |
| Y | Y | Y | 2991.11 | 1001.68 | 1.35463 | 1.26218 |
conv_mlp=True
| fast_norm | fast_layerscale | infer_samples_per_sec | train_samples_per_sec | infer relative | train relative |
|---|---|---|---|---|---|
| N | N | 2249.58 | 793.71 | 1 | 1 |
| Y | N | 2535.58 | 859.07 | 1.12713 | 1.08235 |
| N | Y | 2430.69 | 869.33 | 1.08051 | 1.09527 |
| Y | Y | 2875.24 | 982.37 | 1.27812 | 1.23769 |
ImageNet validation
| model | top1 | top5 |
|---|---|---|
| convnext_tiny | 84.444 | 97.326 |
| convnext_tiny + fast_layerscale | 84.454 | 97.33 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
appreciate the submission, but added complexity is not worth it imho, torchcompile also changes the equation quite a bit.