[InstanceNorm Optimize x86] AVX512/AVX/SSE intrinsic with elempack merged

Open LRY89757 opened this issue 3 years ago • 7 comments

Add the avx512/avx/sse inrinsic for instancenorm

Jul 21 '22 02:07 LRY89757

Codecov Report

Merging #4062 (b0e9531) into master (00c08d7) will decrease coverage by 1.46%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4062      +/-   ##
==========================================
- Coverage   94.43%   92.97%   -1.47%     
==========================================
  Files         748      749       +1     
  Lines      179005   178735     -270     
==========================================
- Hits       169047   166181    -2866     
- Misses       9958    12554    +2596

Impacted Files	Coverage Δ
src/layer/x86/instancenorm_x86.cpp	`100.00% <100.00%> (ø)`
src/layer/x86/convolution_2x2_pack8.h	`2.75% <0.00%> (-97.25%)`	:arrow_down:
src/layer/x86/deconvolution_pack8.h	`10.76% <0.00%> (-89.24%)`	:arrow_down:
src/layer/x86/convolution_sgemm_pack8.h	`14.24% <0.00%> (-85.24%)`	:arrow_down:
src/layer/x86/convolution_sgemm_pack4to8.h	`29.16% <0.00%> (-70.84%)`	:arrow_down:
src/layer/x86/convolution_pack8.h	`34.42% <0.00%> (-65.58%)`	:arrow_down:
src/layer/x86/convolution_pack4to8.h	`42.85% <0.00%> (-55.11%)`	:arrow_down:
...c/layer/x86/convolution_winograd_transform_pack8.h	`54.90% <0.00%> (-45.10%)`	:arrow_down:
src/layer/x86/convolution_3x3_pack1to8.h	`39.95% <0.00%> (-40.04%)`	:arrow_down:
src/layer/x86/convolution_winograd_dot_pack8.h	`60.24% <0.00%> (-39.16%)`	:arrow_down:
... and 46 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

Jul 21 '22 03:07 codecov-commenter

missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ?

If so, does the x86 part of batchnorm also need further optimization? @nihui

Jul 24 '22 03:07 LRY89757

missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ?

If so, does the x86 part of batchnorm also need further optimization? @nihui

You could merge the multiple elempack codepath in batchnorm

Jul 24 '22 14:07 nihui

missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ?

If so, does the x86 part of batchnorm also need further optimization? @nihui

You could merge the multiple elempack codepath in batchnorm

Ok, I will try to merge the elempack into one

Jul 25 '22 08:07 LRY89757