torch2trt
torch2trt copied to clipboard
deal the outputs cost more time, why?
output = model_trt(input) output = output.cpu() # cost about 0.2s , why?
hello, did you solve this problem? I have the same problem.
You are explicitly copying tensors from gpu to cpu, hence , the cost of transfer
Is there any way to solve this problem?
I mean if you really want to copy tensor from gpu to cpu, then you will have to incur some transfer time
Hi All,
I'd guess the line actually includes the model execution time. This has to do with how PyTorch launches work on GPU. When you call
output = model_trt(input)
The GPU task is launched, but the Python interpreter actually moves forward before the execution has finished. When you call
output = output.cpu()
An explicit synchronization occurs (Python waits for the GPU task to finish) before transferring the data to the CPU.
Depending on how you're measuring the time of each line, this time might actually include the model execution time itself.
To avoid this, you can add explicit synchronization
output = model_trt(input)
torch.cuda.current_stream().synchronize() # wait for model execution to finish
output = output.cpu() # <-- now actually represents CPU->GPU transfer time
Hope this helps clarify.
Best, John