libtorch-yolov3-deepsort icon indicating copy to clipboard operation
libtorch-yolov3-deepsort copied to clipboard

Transfer torch::Tensor from cuda to cpu slow

Open datlt4 opened this issue 4 years ago • 0 comments

When I test your repo i found that in line 72 of nn_matching.h have a trouble. when you call nn_cosine_distance if for loop (line 55), when i==0, time taken by .cpu() was >22000 micro second, but with other index, it took only 10-20 microseconds. If there is the way decrease that one, performance will increase dramatically.

datlt4 avatar Oct 13 '21 07:10 datlt4