tensorrtx
tensorrtx copied to clipboard
Scaled yolo high initial inference time
Env
- GPU, RTX2080TI
- OS, Ubuntu16.04
- Cuda version 10.2
- TensorRT version 7.4
About this repo
- which branch/tag/commit are you using? Master
- which model? scaled-yolo4
Your problem
- what is your command?
sudo ./yolov4csp -d ../samples
- what's your output?
enqueue 416.22 ms
cudaStreamSynchronize 0.003422 ms
420ms
304.34, 213.305, 16.8873, 50.1413, 0.895157, 0, 0.991489,
144.34, 178.38, 47.5342, 17.4259, 0.895705, 2, 0.984215,
444.259, 245.162, 133.524, 70.393, 0.872347, 2, 0.99593,
458.208, 274.328, 106.951, 88.3923, 0.864996, 2, 0.991907,
378.928, 219.612, 74.8686, 42.8357, 0.776623, 2, 0.98347,
234.093, 162.487, 14.236, 10.2134, 0.750553, 2, 0.953275,
465.514, 321.583, 93.353, 150.152, 0.718397, 2, 0.98752,
363.084, 208.169, 75.7036, 38.7428, 0.671901, 2, 0.981167,
322.546, 190.248, 97.9998, 38.3507, 0.652684, 2, 0.619701,
101.227, 203.749, 35.0163, 21.3945, 0.626124, 2, 0.987616,
enqueue 1.75403 ms
cudaStreamSynchronize 0.003515 ms
19ms
377.258, 265.605, 28.9524, 91.3374, 0.899652, 0, 0.998451,
209.202, 240.879, 84.4317, 63.8479, 0.920647, 2, 0.98987,
144.059, 178.172, 48.0174, 18.1125, 0.885829, 2, 0.983723,
458.718, 274.427, 106.654, 87.95, 0.866359, 2, 0.98652,
451.525, 242.98, 122.015, 68.9903, 0.76769, 2, 0.994225,
314.558, 189.053, 79.0575, 39.2027, 0.763484, 2, 0.783326,
428.219, 222.473, 161.346, 50.4856, 0.704973, 2, 0.992093,
366.088, 211.274, 81.0008, 42.3466, 0.674908, 2, 0.986258,
103.196, 203.9, 30.8972, 21.2797, 0.63341, 2, 0.990404,
465.782, 324.95, 91.8778, 141.123, 0.593609, 2, 0.985215,
324.922, 198.884, 52.3608, 30.4711, 0.565539, 2, 0.841305,
enqueue 1.76401 ms
cudaStreamSynchronize 0.002985 ms
22ms
414.547, 301.517, 67.1701, 232.498, 0.936981, 0, 0.997469,
135.549, 307.328, 94.6661, 238.689, 0.929696, 0, 0.996901,
198.119, 299.354, 58.194, 213.746, 0.901227, 0, 0.954647,
82.829, 337.161, 37.0463, 152.874, 0.679604, 0, 0.997766,
259.538, 230.172, 367.234, 244.545, 0.916708, 5, 0.98943,
enqueue 1.74363 ms
cudaStreamSynchronize 0.002334 ms
- what output do you expect?
The initial inference time takes ~400ms and later converges to ~20ms. Most of the initial time is occupied by the
context.enqueue
line in inference method
GPU needs warmup. This is normal.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.