What is the most time-consuming step in predicting the pose of an object in an image?
Dear Liu Yuan:
I want to know what is the most time-consuming step in predicting the pose of an object in an image so tahat I can do some applications. Thank you!
I think in the paper it was described to be the pose refinement.
Running time. To process an image of size 540×960, Gen6D estimator costs ∼0.64 second in total on a 2080Ti GPU, in which the object detector costs ∼0.1 second, the viewpoint selector costs ∼0.04 second and the refiner with 3 times refinement costs ∼0.5 second.
Yes, the most time-consuming is the refiner, because it involves a 3D CNN which is relatively slower. Moreover, I only use the detector and the selector for initialization, which only requires running once. Afterward, only the refiner is iteratively applied. To improve efficiency, the key is to improve the refiner.