imageproc
imageproc copied to clipboard
Interest for a parallel version of match_template?
Hello,
For a personal project, I wrote a parallel version of match_template, keeping essentially the same algorithm but using a Rayon parallel iterator for the y coordinate and collecting the result.
If there is interest here, I'm happy to do a PR with my changes. It would be good to know whether this should be an addition or a replacement, and if some tests are needed.
kind regards
How fast is it in comparison? (If you benchmarked it)
I'll do a benchmark and report the results tomorrow, but I would say, close to the number of cores.
On an AMD Ryzen 9 7950X 16-Core Processor (running Ubuntu 22.04.2 LTS), looking for a 259 x 415 template in a 750 x 750 image took 10.4s with the current algorithm, and 974ms with my new version, so a speed-up of more than a factor of 10. This is lower than the core count, but not unexpected due to the thermal limitations of the CPU preventing all cores running at the same speed as a single core.
Funnily, running the parallel version but restricted to one cpu with cpulimit takes 9.98s, so less that the non-parallel version. This is reproducible. I do not know whether this is due to the usage of iterators in the parallel version or somehow an artifact of cpulimit.
In any case, I would say that the speed-up is worth considering integration, do you agree? If so, do you want the new function as a replacement or as an additional function? I can then prepare a PR.
Okay, sounds good.
Funnily, running the parallel version but restricted to one cpu with cpulimit takes 9.98s, so less that the non-parallel version
I see
In any case, I would say that the speed-up is worth considering integration
For sure
do you want the new function as a replacement or as an additional function?
Additional is what i would prefer..
Thanks a lot for writing this parallel version btw...i was looking for it and found this issue in just the right time. I hope your PR gets merged asap, otherwise i would like to ask you to share the code with me..let's see, thanks again!
The PR is ready: #527.
I went for simplicity, if we are all happy about this new feature, it might be possible to de-duplicate some code between the original and the parallel version, or even, only use the parallel version by just changing the loop type depending on the availability of rayon.
Any maintainer to approve the PR?
Implemented in https://github.com/image-rs/imageproc/pull/527