TensorRT TensorRT runs much slower for non-continuous inference

Description

When inserting sleep(); between inference, the inference time becomes much longer

    for (int i = 0; i < 10000; i++)
    {
        auto t1 = high_resolution_clock::now();
        context->executeV2(bindings);
        // context->enqueueV2(bindings, stream, nullptr);
        // cudaStreamSynchronize(stream);
        auto t2 = high_resolution_clock::now();
        cout<<"iter " << i << " time = " << duration_cast<microseconds>(t2 - t1).count() / 1000.0 << " ms" << endl;
        _sleep(1000);
    }

If the _sleep(1000); is removed, the inference time will be very stable, roughly 2ms, no matter how long it runs.

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: RTX 2070 super NVIDIA Driver Version: 516.01 CUDA Version: 11.7 CUDNN Version: 8.4.1.50 Operating System: windows 11 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

Load any tensorrt model, compare the following for loop with/without sleep();

for(...)
{
    t1 = clock();
    inference();
    t2 = clock();
    print(t2-t1);
    sleep();
}

Jun 30 '22 02:06 Apeiria

This is usually caused by GPUs going to idle and not an issue in TRT. Could you check the solution in https://github.com/NVIDIA/TensorRT/issues/2042 and see if it works?

Jun 30 '22 02:06 nvpohanh

This is usually caused by GPUs going to idle and not an issue in TRT. Could you check the solution in #2042 and see if it works?

Thanks for your reply, it works! However are there anything I can do in the c++ code to prevent the GPU from going to sleep when idle?

Jun 30 '22 03:06 Apeiria

No, it's mostly caused by the GPU driver, so you need to change the driver settings.

Jun 30 '22 03:06 nvpohanh

No, it's mostly caused by the GPU driver, so you need to change the driver settings.

OK, thanks a lot for your help!

Jun 30 '22 03:06 Apeiria

No, it's mostly caused by the GPU driver, so you need to change the driver settings.

Hi again, I just did the same experiment in pytorch, and I found the inference time is quite stable regardless of how long I make it sleep. Do you have any idea why the behavior is different for pytorch and tensorrt?

    for j in range(10000):
        with torch.inference_mode():
            t1 = time.time()
            pred = model(img)
            print("iter", j, "time =", (time.time()-t1)*1000,"ms")
        time.sleep(1)

Thanks!

Jun 30 '22 05:06 Apeiria

@Apeiria Is the PyTorch using GPU or CPU? Also, could you try time.sleep(5) and see if it makes any difference?

Jun 30 '22 10:06 nvpohanh

@Apeiria Is the PyTorch using GPU or CPU? Also, could you try time.sleep(5) and see if it makes any difference?

My pytorch is using gpu. I tried time.sleep(5) and it seems the same, below is the result.

Jun 30 '22 11:06 Apeiria

interesting... we will take a look when having the chance. Meanwhile, I still recommend that you try the approaches in https://github.com/NVIDIA/TensorRT/issues/2042 to see if any of them fixes the issue.

Jul 01 '22 02:07 nvpohanh

interesting... we will take a look when having the chance. Meanwhile, I still recommend that you try the approaches in #2042 to see if any of them fixes the issue.

Thanks a lot, I tried those approaches and nvidia-smi -lgc worked to some extent. After locking the frequency to maximum, the inference time still increased compared to the version without sleep, but less dramatically. (before locking, inference time: 2ms -> 12ms; after: 2ms -> 4ms). however I'm only able to change the driver settings of my computer, not those the model is going to be deployed to, so unfortunately those approaches wouldn't solve my problem (T_T).

Jul 01 '22 10:07 Apeiria

I fix this question, but it cost lot of time

so i need tensorrt time give me reward ，to announce the solution

mailbox ： [email protected]

Jan 06 '23 05:01 haibozhang123

TensorRT TensorRT copied to clipboard

TensorRT runs much slower for non-continuous inference

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard