Question about inference precision
Hi there, what a brilliant job! The latency on NVIDIA RTX 2080TI with a 1920x1080 image is about 7.3ms as your docment said, though I'm wondering at what precision when your were testing the latency, is it float32 or float16 or even int8? I cannot find any clues on the website nor in the paper. Thanks a lot.
Float32
Float32
Thanks for your reply. I've tested on my server with a Tesla T4 GPU, and it takes about 23.3ms for inferring a 1080p image with just two people (I suppose the process for measuring the latency including from uploading the image to downloading the result), it still takes about 11.7 ms when inferring with the int8 mode, isn't it wreird?
Here is my enviroment details: GPU: Telsa T4 Tensorrt: 6.0.1 cuda: V10.1.243 cudnn: 7.6.3