opencv_contrib
opencv_contrib copied to clipboard
ximgproc: optimize guidedfilter function for ARM64 using NEON intrinsics
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
-
[x] I agree to contribute to the project under Apache 2 License.
-
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
-
[x] The PR is proposed to the proper branch
-
This PR introduces an ARM64-specific optimization for the add_mul function in edgeaware_filters_common.cpp using NEON intrinsics.
-
The optimization is applied only when CV_NEON is defined and the runtime NEON check (cv::checkHardwareSupport(CV_CPU_NEON)) passes.
-
The SIMD implementation leverages NEON instructions (vld1q_f32, vmulq_f32, vaddq_f32, vst1q_f32) to accelerate the fused multiply-add operation on 4-element float vectors.
-
This brings parity with existing x64 SIMD optimizations using SSE1.
-
The addition of the ARM64 NEON optimization for the add_mul function in edgeaware_filters_common.cpp has led to performance improvements in some tests of GuidedFilter function.