ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

[InstanceNorm Optimize x86] AVX512/AVX/SSE intrinsic with elempack merged

Open LRY89757 opened this issue 3 years ago • 7 comments

  • Add the avx512/avx/sse inrinsic for instancenorm

LRY89757 avatar Jul 21 '22 02:07 LRY89757

Codecov Report

Merging #4062 (b0e9531) into master (00c08d7) will decrease coverage by 1.46%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4062      +/-   ##
==========================================
- Coverage   94.43%   92.97%   -1.47%     
==========================================
  Files         748      749       +1     
  Lines      179005   178735     -270     
==========================================
- Hits       169047   166181    -2866     
- Misses       9958    12554    +2596     
Impacted Files Coverage Δ
src/layer/x86/instancenorm_x86.cpp 100.00% <100.00%> (ø)
src/layer/x86/convolution_2x2_pack8.h 2.75% <0.00%> (-97.25%) :arrow_down:
src/layer/x86/deconvolution_pack8.h 10.76% <0.00%> (-89.24%) :arrow_down:
src/layer/x86/convolution_sgemm_pack8.h 14.24% <0.00%> (-85.24%) :arrow_down:
src/layer/x86/convolution_sgemm_pack4to8.h 29.16% <0.00%> (-70.84%) :arrow_down:
src/layer/x86/convolution_pack8.h 34.42% <0.00%> (-65.58%) :arrow_down:
src/layer/x86/convolution_pack4to8.h 42.85% <0.00%> (-55.11%) :arrow_down:
...c/layer/x86/convolution_winograd_transform_pack8.h 54.90% <0.00%> (-45.10%) :arrow_down:
src/layer/x86/convolution_3x3_pack1to8.h 39.95% <0.00%> (-40.04%) :arrow_down:
src/layer/x86/convolution_winograd_dot_pack8.h 60.24% <0.00%> (-39.16%) :arrow_down:
... and 46 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

codecov-commenter avatar Jul 21 '22 03:07 codecov-commenter

missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ?

If so, does the x86 part of batchnorm also need further optimization? @nihui

LRY89757 avatar Jul 24 '22 03:07 LRY89757

missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ?

If so, does the x86 part of batchnorm also need further optimization? @nihui

You could merge the multiple elempack codepath in batchnorm

nihui avatar Jul 24 '22 14:07 nihui

missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ?

If so, does the x86 part of batchnorm also need further optimization? @nihui

You could merge the multiple elempack codepath in batchnorm

Ok, I will try to merge the elempack into one

LRY89757 avatar Jul 25 '22 08:07 LRY89757