torch2trt deal the outputs cost more time, why?

deal the outputs cost more time, why?

Open luyitry opened this issue 2 years ago • 5 comments

output = model_trt(input) output = output.cpu() # cost about 0.2s , why?

Mar 31 '22 02:03 luyitry

hello, did you solve this problem? I have the same problem.

Apr 15 '22 12:04 huangqingyi-code

You are explicitly copying tensors from gpu to cpu, hence , the cost of transfer

Apr 18 '22 19:04 SrivastavaKshitij

Is there any way to solve this problem?

Apr 19 '22 03:04 huangqingyi-code

I mean if you really want to copy tensor from gpu to cpu, then you will have to incur some transfer time

Apr 22 '22 15:04 SrivastavaKshitij

Hi All,

I'd guess the line actually includes the model execution time. This has to do with how PyTorch launches work on GPU. When you call

output = model_trt(input)

The GPU task is launched, but the Python interpreter actually moves forward before the execution has finished. When you call

output = output.cpu()

An explicit synchronization occurs (Python waits for the GPU task to finish) before transferring the data to the CPU.

Depending on how you're measuring the time of each line, this time might actually include the model execution time itself.

To avoid this, you can add explicit synchronization

output = model_trt(input)
torch.cuda.current_stream().synchronize()  # wait for model execution to finish
output = output.cpu()  # <-- now actually represents CPU->GPU transfer time

Hope this helps clarify.

Best, John

May 05 '22 19:05 jaybdub

torch2trt torch2trt copied to clipboard

deal the outputs cost more time, why?

torch2trt
torch2trt copied to clipboard