TensorRT
TensorRT copied to clipboard
TensorRT runs much slower for non-continuous inference
Description
When inserting sleep(); between inference, the inference time becomes much longer
for (int i = 0; i < 10000; i++)
{
auto t1 = high_resolution_clock::now();
context->executeV2(bindings);
// context->enqueueV2(bindings, stream, nullptr);
// cudaStreamSynchronize(stream);
auto t2 = high_resolution_clock::now();
cout<<"iter " << i << " time = " << duration_cast<microseconds>(t2 - t1).count() / 1000.0 << " ms" << endl;
_sleep(1000);
}
If the _sleep(1000);
is removed, the inference time will be very stable, roughly 2ms, no matter how long it runs.
Environment
TensorRT Version: 8.4.1.5 NVIDIA GPU: RTX 2070 super NVIDIA Driver Version: 516.01 CUDA Version: 11.7 CUDNN Version: 8.4.1.50 Operating System: windows 11 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
Load any tensorrt model, compare the following for loop with/without sleep();
for(...)
{
t1 = clock();
inference();
t2 = clock();
print(t2-t1);
sleep();
}
This is usually caused by GPUs going to idle and not an issue in TRT. Could you check the solution in https://github.com/NVIDIA/TensorRT/issues/2042 and see if it works?
This is usually caused by GPUs going to idle and not an issue in TRT. Could you check the solution in #2042 and see if it works?
Thanks for your reply, it works! However are there anything I can do in the c++ code to prevent the GPU from going to sleep when idle?
No, it's mostly caused by the GPU driver, so you need to change the driver settings.
No, it's mostly caused by the GPU driver, so you need to change the driver settings.
OK, thanks a lot for your help!
No, it's mostly caused by the GPU driver, so you need to change the driver settings.
Hi again, I just did the same experiment in pytorch, and I found the inference time is quite stable regardless of how long I make it sleep. Do you have any idea why the behavior is different for pytorch and tensorrt?
for j in range(10000):
with torch.inference_mode():
t1 = time.time()
pred = model(img)
print("iter", j, "time =", (time.time()-t1)*1000,"ms")
time.sleep(1)
Thanks!
@Apeiria Is the PyTorch using GPU or CPU? Also, could you try time.sleep(5)
and see if it makes any difference?
@Apeiria Is the PyTorch using GPU or CPU? Also, could you try
time.sleep(5)
and see if it makes any difference?
My pytorch is using gpu.
I tried time.sleep(5) and it seems the same, below is the result.
interesting... we will take a look when having the chance. Meanwhile, I still recommend that you try the approaches in https://github.com/NVIDIA/TensorRT/issues/2042 to see if any of them fixes the issue.
interesting... we will take a look when having the chance. Meanwhile, I still recommend that you try the approaches in #2042 to see if any of them fixes the issue.
Thanks a lot, I tried those approaches and nvidia-smi -lgc
worked to some extent. After locking the frequency to maximum, the inference time still increased compared to the version without sleep, but less dramatically. (before locking, inference time: 2ms -> 12ms; after: 2ms -> 4ms). however I'm only able to change the driver settings of my computer, not those the model is going to be deployed to, so unfortunately those approaches wouldn't solve my problem (T_T).
I fix this question, but it cost lot of time
so i need tensorrt time give me reward ,to announce the solution
mailbox : [email protected]