ViTPose About Inference speed

Are you sure that this method is faster than HRNet? I have tried both with yolov5 as the detector in trt inference. HRNet achieves around 30-35 fps while VitPose can reach 7 fps at the same video with trt. Inference test I have conducted show that hrnet is 6-7 faster when using larger batch sizes for some reason (around 220 fps per target for fp16 and 450 fps for int8) while VitPose achieves around 60 fps per target in trt.

Oct 28 '22 23:10 gpastal24

Thanks for your attention. Please refer to the paper for the settings in the speed test. With the advanced GPUs and PyTorch framework, ViTPose is faster than HRNet. Besides, the inference speed using tensorRT is not only related to the model but also the configurations. e.g., the maximum memory allowed or the optimal searched calculation manner.

Oct 29 '22 00:10 Annbless

Hi, thank you for answering. I had run a test in native Pytorch as well. ViTPose was indeed faster or similar when the batch size was equal to 1. I tested the yolov5 hrnet and vitpose pipeline with the webcam for single person infrerence and the VitPose method had indeed higher fps. When i increased the batch size to 10 ,HRNet for some reason was 2-3 times faster both in the inference test and with a video. If I understood correctly these results could be related to my GPU (GTX 1650)? In the following pictures I have attached the inference tests in Pytorch. The first row in each picture is with batch 1 while the second with batch size equal to 10. Screenshot from 2022-10-29 11-21-14

Oct 29 '22 08:10 gpastal24