opencv_contrib icon indicating copy to clipboard operation
opencv_contrib copied to clipboard

ximgproc: optimize guidedfilter function for ARM64 using NEON intrinsics

Open pratham-mcw opened this issue 4 months ago • 1 comments

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • [x] I agree to contribute to the project under Apache 2 License.

  • [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV

  • [x] The PR is proposed to the proper branch

  • This PR introduces an ARM64-specific optimization for the add_mul function in edgeaware_filters_common.cpp using NEON intrinsics.

  • The optimization is applied only when CV_NEON is defined and the runtime NEON check (cv::checkHardwareSupport(CV_CPU_NEON)) passes.

  • The SIMD implementation leverages NEON instructions (vld1q_f32, vmulq_f32, vaddq_f32, vst1q_f32) to accelerate the fused multiply-add operation on 4-element float vectors.

  • This brings parity with existing x64 SIMD optimizations using SSE1.

  • The addition of the ARM64 NEON optimization for the add_mul function in edgeaware_filters_common.cpp has led to performance improvements in some tests of GuidedFilter function. image

pratham-mcw avatar Jul 29 '25 07:07 pratham-mcw