ultralytics_ros
ultralytics_ros copied to clipboard
Improvement of processing speed
Branch
noetic-devel
Description
The two functions that take the most time are projectCloud()
and downsampleCloudMsg()
, both of which seem to take similar amounts of time. While I can't think of any way to improve the speed of downsampleCloudMsg()
, it seems that projectCloud()
could benefit from parallel processing with OpenMP.
Additional
Within projectCloud()
, the most time consuming processes are processPointsWithBbox()
and processPointsWithMask()
, with euclideanClusterExtraction()
taking about a fifth of the time, which is less than I originally thought.
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
I've implemented parallel processing using OpenMP in the feature/omp_parallel branch. Testing with the KITTI dataset, the average processing time in syncCallback()
improved from 17.5 ms to 14.8 ms, marking an average improvement of 15.5%.
My configuration is as follows:
- WSL Ubuntu-20.04 ROS Noetic
- Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz
- 16.0GB RAM
- NVIDIA GeForce RTX 2060
@h-wata, if you have some time, could you please test it in your environment as well? If you notice a significant improvement and find this change beneficial, I'll consider merging it into the noetic-devel branch. Thank you for your cooperation.
Thank you for your implementation.
When I set voxel_leaf_size:=0.01
, it takes 2.0 seconds to calculate projectCloud() in the noetic-devel
. By the way, in the feature/omp_parallel
branch, it takes between 0.6 to 0.7 seconds.
Furthermore, the CPU usage is three times higher than in the noetic-devel
when three objects are shown.
However, this setting is quite aggressive. With the default setting of 0.1, both branches should have no problem regarding computation time.