tensorrtx icon indicating copy to clipboard operation
tensorrtx copied to clipboard

Scaled yolo high initial inference time

Open saimouli opened this issue 2 years ago • 1 comments

Env

  • GPU, RTX2080TI
  • OS, Ubuntu16.04
  • Cuda version 10.2
  • TensorRT version 7.4

About this repo

  • which branch/tag/commit are you using? Master
  • which model? scaled-yolo4

Your problem

  • what is your command? sudo ./yolov4csp -d ../samples
  • what's your output?
enqueue 416.22 ms
cudaStreamSynchronize 0.003422 ms
420ms
304.34, 213.305, 16.8873, 50.1413, 0.895157, 0, 0.991489, 
144.34, 178.38, 47.5342, 17.4259, 0.895705, 2, 0.984215, 
444.259, 245.162, 133.524, 70.393, 0.872347, 2, 0.99593, 
458.208, 274.328, 106.951, 88.3923, 0.864996, 2, 0.991907, 
378.928, 219.612, 74.8686, 42.8357, 0.776623, 2, 0.98347, 
234.093, 162.487, 14.236, 10.2134, 0.750553, 2, 0.953275, 
465.514, 321.583, 93.353, 150.152, 0.718397, 2, 0.98752, 
363.084, 208.169, 75.7036, 38.7428, 0.671901, 2, 0.981167, 
322.546, 190.248, 97.9998, 38.3507, 0.652684, 2, 0.619701, 
101.227, 203.749, 35.0163, 21.3945, 0.626124, 2, 0.987616, 
enqueue 1.75403 ms
cudaStreamSynchronize 0.003515 ms
19ms
377.258, 265.605, 28.9524, 91.3374, 0.899652, 0, 0.998451, 
209.202, 240.879, 84.4317, 63.8479, 0.920647, 2, 0.98987, 
144.059, 178.172, 48.0174, 18.1125, 0.885829, 2, 0.983723, 
458.718, 274.427, 106.654, 87.95, 0.866359, 2, 0.98652, 
451.525, 242.98, 122.015, 68.9903, 0.76769, 2, 0.994225, 
314.558, 189.053, 79.0575, 39.2027, 0.763484, 2, 0.783326, 
428.219, 222.473, 161.346, 50.4856, 0.704973, 2, 0.992093, 
366.088, 211.274, 81.0008, 42.3466, 0.674908, 2, 0.986258, 
103.196, 203.9, 30.8972, 21.2797, 0.63341, 2, 0.990404, 
465.782, 324.95, 91.8778, 141.123, 0.593609, 2, 0.985215, 
324.922, 198.884, 52.3608, 30.4711, 0.565539, 2, 0.841305, 
enqueue 1.76401 ms
cudaStreamSynchronize 0.002985 ms
22ms
414.547, 301.517, 67.1701, 232.498, 0.936981, 0, 0.997469, 
135.549, 307.328, 94.6661, 238.689, 0.929696, 0, 0.996901, 
198.119, 299.354, 58.194, 213.746, 0.901227, 0, 0.954647, 
82.829, 337.161, 37.0463, 152.874, 0.679604, 0, 0.997766, 
259.538, 230.172, 367.234, 244.545, 0.916708, 5, 0.98943, 
enqueue 1.74363 ms
cudaStreamSynchronize 0.002334 ms
  • what output do you expect? The initial inference time takes ~400ms and later converges to ~20ms. Most of the initial time is occupied by the context.enqueue line in inference method

saimouli avatar Jun 22 '22 13:06 saimouli

GPU needs warmup. This is normal.

wang-xinyu avatar Jun 30 '22 03:06 wang-xinyu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 31 '22 02:08 stale[bot]