pytorch-image-models Improve convnext v1 amp speed.

Improve convnext v1 amp speed.

Open tascj opened this issue 2 years ago • 1 comments

Add an option to fuse layerscale into last linear in Mlp. Less elementwise operations improves amp train/infer speed.
Reshape x for Mlp, which slightly improves speed.

Together with the fast_norm option, convnext train/infer is much faster.

RTX3090, pt112-cu113, apex not installed

python benchmark.py --model convnext_tiny --img-size 224 --amp

fast_norm	reshape_x	fast_layerscale	infer_samples_per_sec	train_samples_per_sec	infer relative	train relative
N	N	N	2208.06	793.61	1	1
Y	N	N	2485.99	858.76	1.12587	1.08209
N	Y	N	2320.33	806.36	1.05085	1.01607
N	N	Y	2381.74	867.64	1.07866	1.09328
Y	Y	N	2623.06	872.98	1.18795	1.10001
Y	N	Y	2816.98	980.99	1.27577	1.23611
N	Y	Y	2514.63	883.65	1.13884	1.11346
Y	Y	Y	2991.11	1001.68	1.35463	1.26218

fast_norm	fast_layerscale	infer_samples_per_sec	train_samples_per_sec	infer relative	train relative
N	N	2249.58	793.71	1	1
Y	N	2535.58	859.07	1.12713	1.08235
N	Y	2430.69	869.33	1.08051	1.09527
Y	Y	2875.24	982.37	1.27812	1.23769

model	top1	top5
convnext_tiny	84.444	97.326
convnext_tiny + fast_layerscale	84.454	97.33

Mar 28 '23 01:03 tascj

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Apr 09 '23 16:04 HuggingFaceDocBuilderDev

appreciate the submission, but added complexity is not worth it imho, torchcompile also changes the equation quite a bit.

Jul 26 '24 17:07 rwightman