pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

Improve convnext v1 amp speed.

Open tascj opened this issue 2 years ago • 1 comments

Modification

  1. Add an option to fuse layerscale into last linear in Mlp. Less elementwise operations improves amp train/infer speed.
  2. Reshape x for Mlp, which slightly improves speed.

Together with the fast_norm option, convnext train/infer is much faster.

Benchmark

RTX3090, pt112-cu113, apex not installed

python benchmark.py --model convnext_tiny --img-size 224 --amp

conv_mlp=False

fast_norm reshape_x fast_layerscale infer_samples_per_sec train_samples_per_sec infer relative train relative
N N N 2208.06 793.61 1 1
Y N N 2485.99 858.76 1.12587 1.08209
N Y N 2320.33 806.36 1.05085 1.01607
N N Y 2381.74 867.64 1.07866 1.09328
Y Y N 2623.06 872.98 1.18795 1.10001
Y N Y 2816.98 980.99 1.27577 1.23611
N Y Y 2514.63 883.65 1.13884 1.11346
Y Y Y 2991.11 1001.68 1.35463 1.26218

conv_mlp=True

fast_norm fast_layerscale infer_samples_per_sec train_samples_per_sec infer relative train relative
N N 2249.58 793.71 1 1
Y N 2535.58 859.07 1.12713 1.08235
N Y 2430.69 869.33 1.08051 1.09527
Y Y 2875.24 982.37 1.27812 1.23769

ImageNet validation

model top1 top5
convnext_tiny 84.444 97.326
convnext_tiny + fast_layerscale 84.454 97.33

tascj avatar Mar 28 '23 01:03 tascj

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

appreciate the submission, but added complexity is not worth it imho, torchcompile also changes the equation quite a bit.

rwightman avatar Jul 26 '24 17:07 rwightman