The inferrence speed is too slow

Open OutisLi opened this issue 7 months ago • 3 comments

Describe the bug The inferrence speed is too slow

I tried both 4080 and 5090 on Linux, the rtf were nearly the same, both around 0.7. The GPU utilization is below 50%. For 5090, the power usage is only 100+W, which is too low.

I turned both trt and jit on. I use pytorch 2.7 with CUDA 12.8/9 installed.

May 05 '25 14:05 OutisLi