opencv icon indicating copy to clipboard operation
opencv copied to clipboard

GAPI Fluid: SIMD SSE41 for Resize F32C1.

Open anna-khakimova opened this issue 3 years ago • 1 comments

SIMD SSE41 for the Resize F32C1.

Performance: ResizeF32C1_SIMD_SSE.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

anna-khakimova avatar Mar 04 '22 15:03 anna-khakimova

@anna-khakimova friendly reminder.

asmorkalov avatar Sep 07 '22 13:09 asmorkalov

@TolyaTalamanov Is the patch still relevant? If so, could you drive it to merge by yourself?

asmorkalov avatar Apr 06 '23 16:04 asmorkalov

@TolyaTalamanov Is the patch still relevant? If so, could you drive it to merge by yourself?

@dmatveev Could you comment? I don't recall for what we do this

TolyaTalamanov avatar Apr 07 '23 07:04 TolyaTalamanov

@dmatveev Friendly reminder.

asmorkalov avatar May 05 '23 07:05 asmorkalov

@asmorkalov I'll have a look on this. At least will actualize the patch and see what effect it gives for F32 resize.

dmatveev avatar May 11 '23 16:05 dmatveev

@rgarnov could you please have a look?

dmatveev avatar May 23 '23 08:05 dmatveev

Checked performance of 4.x against this branch:

dm@irlvmvdmitryma:~/code/opencv_fluid/modules/ts/misc$ python3 ./summary.py  ~/code/opencv_fluid_BUILD/4x.xml ~/code/opencv_fluid_BUILD/anna_simd.xml | grep 32FC1
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 0.5, 0.5, { gapi.kernel_package })                  0.012  0.012     0.97
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 0.5, 0.25, { gapi.kernel_package })                 0.007  0.008     0.95
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 0.5, 2, { gapi.kernel_package })                    0.037  0.038     0.96
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 0.25, 0.5, { gapi.kernel_package })                 0.008  0.008     0.98
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 0.25, 0.25, { gapi.kernel_package })                0.006  0.006     0.97
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 0.25, 2, { gapi.kernel_package })                   0.023  0.023     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 2, 0.5, { gapi.kernel_package })                    0.033  0.033     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 2, 0.25, { gapi.kernel_package })                   0.018  0.018     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 128x128, 2, 2, { gapi.kernel_package })                      0.120  0.120     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 0.5, 0.5, { gapi.kernel_package })                  0.148  0.148     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 0.5, 0.25, { gapi.kernel_package })                 0.075  0.075     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 0.5, 2, { gapi.kernel_package })                    0.557  0.550     1.01
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 0.25, 0.5, { gapi.kernel_package })                 0.088  0.088     0.99
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 0.25, 0.25, { gapi.kernel_package })                0.043  0.044     0.98
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 0.25, 2, { gapi.kernel_package })                   0.298  0.298     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 2, 0.5, { gapi.kernel_package })                    0.534  0.535     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 2, 0.25, { gapi.kernel_package })                   0.271  0.270     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 640x480, 2, 2, { gapi.kernel_package })                      2.119  2.117     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 0.5, 0.5, { gapi.kernel_package })                 0.416  0.423     0.98
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 0.5, 0.25, { gapi.kernel_package })                0.212  0.216     0.98
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 0.5, 2, { gapi.kernel_package })                   1.639  1.625     1.01
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 0.25, 0.5, { gapi.kernel_package })                0.233  0.236     0.99
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 0.25, 0.25, { gapi.kernel_package })               0.118  0.122     0.97
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 0.25, 2, { gapi.kernel_package })                  0.844  0.842     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 2, 0.5, { gapi.kernel_package })                   1.599  1.594     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 2, 0.25, { gapi.kernel_package })                  0.794  0.797     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1280x720, 2, 2, { gapi.kernel_package })                     6.512  6.520     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 0.5, 0.5, { gapi.kernel_package })                1.008  1.035     0.97
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 0.5, 0.25, { gapi.kernel_package })               0.477  0.476     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 0.5, 2, { gapi.kernel_package })                  3.821  4.011     0.95
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 0.25, 0.5, { gapi.kernel_package })               0.596  0.605     0.98
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 0.25, 0.25, { gapi.kernel_package })              0.267  0.274     0.98
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 0.25, 2, { gapi.kernel_package })                 2.046  2.216     0.92
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 2, 0.5, { gapi.kernel_package })                  3.697  3.834     0.96
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 2, 0.25, { gapi.kernel_package })                 1.819  1.818     1.00
TestPerformance::ResizeFxFyPerfTestFluid/ResizeFxFyPerfTest::(compare_f, 32FC1, 1, 1920x1080, 2, 2, { gapi.kernel_package })                    14.901 14.888    1.00
TestPerformance::ResizeInSimpleGraphPerfTestFluid/ResizeInSimpleGraphPerfTest::(compare_f, 32FC1, 128x128, 0.5, 0.5, { gapi.kernel_package })   0.026  0.024     1.07
TestPerformance::ResizeInSimpleGraphPerfTestFluid/ResizeInSimpleGraphPerfTest::(compare_f, 32FC1, 640x480, 0.5, 0.5, { gapi.kernel_package })   0.278  0.272     1.02
TestPerformance::ResizeInSimpleGraphPerfTestFluid/ResizeInSimpleGraphPerfTest::(compare_f, 32FC1, 1280x720, 0.5, 0.5, { gapi.kernel_package })  0.809  0.792     1.02
TestPerformance::ResizeInSimpleGraphPerfTestFluid/ResizeInSimpleGraphPerfTest::(compare_f, 32FC1, 1920x1080, 0.5, 0.5, { gapi.kernel_package }) 2.246  2.214     1.01
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 128x128, 30x30, { gapi.kernel_package })                             0.005  0.006     0.96
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 128x128, 64x64, { gapi.kernel_package })                             0.011  0.012     0.97
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 640x480, 30x30, { gapi.kernel_package })                             0.007  0.007     0.98
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 640x480, 64x64, { gapi.kernel_package })                             0.014  0.014     0.97
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 1280x720, 30x30, { gapi.kernel_package })                            0.008  0.008     0.99
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 1280x720, 64x64, { gapi.kernel_package })                            0.018  0.017     1.10
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 1920x1080, 30x30, { gapi.kernel_package })                           0.009  0.009     0.99
TestPerformance::ResizePerfTestFluid/ResizePerfTest::(compare_f, 32FC1, 1, 1920x1080, 64x64, { gapi.kernel_package })                           0.036  0.021     1.76

The notable change can be seen only for the last case for 1080p to 64x64 resize. Which (in F32) may be a rare case. Given this, and given that there were no updates for Anna for the last year, and given that there's a lot of merge conflicts now between this branch and 4.x, I propose to close this PR for now.

dmatveev avatar May 29 '23 09:05 dmatveev