Question about inference precision

Open byshichen opened this issue 5 years ago • 2 comments

Hi there, what a brilliant job! The latency on NVIDIA RTX 2080TI with a 1920x1080 image is about 7.3ms as your docment said, though I'm wondering at what precision when your were testing the latency, is it float32 or float16 or even int8? I cannot find any clues on the website nor in the paper. Thanks a lot.

May 07 '20 07:05 byshichen

Float32

May 10 '20 13:05 ywlife

Float32

Thanks for your reply. I've tested on my server with a Tesla T4 GPU, and it takes about 23.3ms for inferring a 1080p image with just two people (I suppose the process for measuring the latency including from uploading the image to downloading the result), it still takes about 11.7 ms when inferring with the int8 mode, isn't it wreird?

Here is my enviroment details: GPU: Telsa T4 Tensorrt: 6.0.1 cuda: V10.1.243 cudnn: 7.6.3

May 12 '20 05:05 byshichen