opencv_contrib ximgproc: optimize guidedfilter function for ARM64 using NEON intrinsics

ximgproc: optimize guidedfilter function for ARM64 using NEON intrinsics

Open pratham-mcw opened this issue 4 months ago • 1 comments

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[x] I agree to contribute to the project under Apache 2 License.
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[x] The PR is proposed to the proper branch
This PR introduces an ARM64-specific optimization for the add_mul function in edgeaware_filters_common.cpp using NEON intrinsics.
The optimization is applied only when CV_NEON is defined and the runtime NEON check (cv::checkHardwareSupport(CV_CPU_NEON)) passes.
The SIMD implementation leverages NEON instructions (vld1q_f32, vmulq_f32, vaddq_f32, vst1q_f32) to accelerate the fused multiply-add operation on 4-element float vectors.
This brings parity with existing x64 SIMD optimizations using SSE1.
The addition of the ARM64 NEON optimization for the add_mul function in edgeaware_filters_common.cpp has led to performance improvements in some tests of GuidedFilter function.

Jul 29 '25 07:07 pratham-mcw