torch2trt icon indicating copy to clipboard operation
torch2trt copied to clipboard

deal the outputs cost more time, why?

Open luyitry opened this issue 2 years ago • 5 comments

output = model_trt(input) output = output.cpu() # cost about 0.2s , why?

luyitry avatar Mar 31 '22 02:03 luyitry

hello, did you solve this problem? I have the same problem.

huangqingyi-code avatar Apr 15 '22 12:04 huangqingyi-code

You are explicitly copying tensors from gpu to cpu, hence , the cost of transfer

SrivastavaKshitij avatar Apr 18 '22 19:04 SrivastavaKshitij

Is there any way to solve this problem?

huangqingyi-code avatar Apr 19 '22 03:04 huangqingyi-code

I mean if you really want to copy tensor from gpu to cpu, then you will have to incur some transfer time

SrivastavaKshitij avatar Apr 22 '22 15:04 SrivastavaKshitij

Hi All,

I'd guess the line actually includes the model execution time. This has to do with how PyTorch launches work on GPU. When you call

output = model_trt(input)

The GPU task is launched, but the Python interpreter actually moves forward before the execution has finished. When you call

output = output.cpu()

An explicit synchronization occurs (Python waits for the GPU task to finish) before transferring the data to the CPU.

Depending on how you're measuring the time of each line, this time might actually include the model execution time itself.

To avoid this, you can add explicit synchronization

output = model_trt(input)
torch.cuda.current_stream().synchronize()  # wait for model execution to finish
output = output.cpu()  # <-- now actually represents CPU->GPU transfer time

Hope this helps clarify.

Best, John

jaybdub avatar May 05 '22 19:05 jaybdub