ArmNeonOptimization icon indicating copy to clipboard operation
ArmNeonOptimization copied to clipboard

Constant Median Filter Run time greater than 200ms

Open DLFCW opened this issue 5 years ago • 3 comments

Radius 5 In 3.4Ghz Cpu i use avx2 to improve add histogram and sub histogram but the time speed greater than 200ms image size 1280*1024

DLFCW avatar Oct 28 '19 02:10 DLFCW

@DLFCW My cpu is 3.9Ghz, the run time of constant median filter with radius 5 and image size 1280x1024 is about 140 ms. And the HISTOGRAM_LEN should be 256 not 512, https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/constant_time_median_filter_uint16.h#L8 After you change it to 256, you should see a little speed up. And by the way, if your image size is too large and filter radius is small, it is not recommended to use this algorithm, because you need to allocate a large chunck of memory to store the column histograms. You can simply try to use the parallel strategy like the normal median filter dose: https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/normal_median_filter_uint16.cpp#L27

Ldpe2G avatar Oct 30 '19 15:10 Ldpe2G

@DLFCW My cpu is 3.9Ghz, the run time of constant median filter with radius 5 and image size 1280x1024 is about 140 ms. And the HISTOGRAM_LEN should be 256 not 512, https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/constant_time_median_filter_uint16.h#L8 After you change it to 256, you should see a little speed up. And by the way, if your image size is too large and filter radius is small, it is not recommended to use this algorithm, because you need to allocate a large chunck of memory to store the column histograms. You can simply try to use the parallel strategy like the normal median filter dose: https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/normal_median_filter_uint16.cpp#L27

do you know halcon? a machine vision library . The library run constant median filter only 0.9ms in same condition

DLFCW avatar Nov 06 '19 08:11 DLFCW

No, have not heard before, the library must be done a lot optimization. I have just implemented the basic algorithm described in the paper, and there are some optimization tips described in the paper that I did not try.

Ldpe2G avatar Nov 06 '19 13:11 Ldpe2G