tensorrtx Scaled yolo high initial inference time

Env

GPU, RTX2080TI
OS, Ubuntu16.04
Cuda version 10.2
TensorRT version 7.4

About this repo

which branch/tag/commit are you using? Master
which model? scaled-yolo4

Your problem

what is your command? sudo ./yolov4csp -d ../samples
what's your output?

enqueue 416.22 ms
cudaStreamSynchronize 0.003422 ms
420ms
304.34, 213.305, 16.8873, 50.1413, 0.895157, 0, 0.991489, 
144.34, 178.38, 47.5342, 17.4259, 0.895705, 2, 0.984215, 
444.259, 245.162, 133.524, 70.393, 0.872347, 2, 0.99593, 
458.208, 274.328, 106.951, 88.3923, 0.864996, 2, 0.991907, 
378.928, 219.612, 74.8686, 42.8357, 0.776623, 2, 0.98347, 
234.093, 162.487, 14.236, 10.2134, 0.750553, 2, 0.953275, 
465.514, 321.583, 93.353, 150.152, 0.718397, 2, 0.98752, 
363.084, 208.169, 75.7036, 38.7428, 0.671901, 2, 0.981167, 
322.546, 190.248, 97.9998, 38.3507, 0.652684, 2, 0.619701, 
101.227, 203.749, 35.0163, 21.3945, 0.626124, 2, 0.987616, 
enqueue 1.75403 ms
cudaStreamSynchronize 0.003515 ms
19ms
377.258, 265.605, 28.9524, 91.3374, 0.899652, 0, 0.998451, 
209.202, 240.879, 84.4317, 63.8479, 0.920647, 2, 0.98987, 
144.059, 178.172, 48.0174, 18.1125, 0.885829, 2, 0.983723, 
458.718, 274.427, 106.654, 87.95, 0.866359, 2, 0.98652, 
451.525, 242.98, 122.015, 68.9903, 0.76769, 2, 0.994225, 
314.558, 189.053, 79.0575, 39.2027, 0.763484, 2, 0.783326, 
428.219, 222.473, 161.346, 50.4856, 0.704973, 2, 0.992093, 
366.088, 211.274, 81.0008, 42.3466, 0.674908, 2, 0.986258, 
103.196, 203.9, 30.8972, 21.2797, 0.63341, 2, 0.990404, 
465.782, 324.95, 91.8778, 141.123, 0.593609, 2, 0.985215, 
324.922, 198.884, 52.3608, 30.4711, 0.565539, 2, 0.841305, 
enqueue 1.76401 ms
cudaStreamSynchronize 0.002985 ms
22ms
414.547, 301.517, 67.1701, 232.498, 0.936981, 0, 0.997469, 
135.549, 307.328, 94.6661, 238.689, 0.929696, 0, 0.996901, 
198.119, 299.354, 58.194, 213.746, 0.901227, 0, 0.954647, 
82.829, 337.161, 37.0463, 152.874, 0.679604, 0, 0.997766, 
259.538, 230.172, 367.234, 244.545, 0.916708, 5, 0.98943, 
enqueue 1.74363 ms
cudaStreamSynchronize 0.002334 ms

what output do you expect? The initial inference time takes ~400ms and later converges to ~20ms. Most of the initial time is occupied by the context.enqueue line in inference method

Jun 22 '22 13:06 saimouli

GPU needs warmup. This is normal.

Jun 30 '22 03:06 wang-xinyu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aug 31 '22 02:08 stale[bot]

tensorrtx tensorrtx copied to clipboard

Scaled yolo high initial inference time

Env

About this repo

Your problem

tensorrtx
tensorrtx copied to clipboard