Lidar_AI_Solution icon indicating copy to clipboard operation
Lidar_AI_Solution copied to clipboard

Inference time 25 FPS

Open dav695 opened this issue 1 year ago • 7 comments

Hello! I am trying to get this 25 FPS on the ORIN, and I am following the steps in the readme. I am using the model (resnet50 int8 onnx and PTQ models) from the zip, but with this model the inference time I got is around 17Hz :

==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 	1.51542 ms
[⏰ [NoSt] ImageNrom]: 	7.68278 ms
[⏰ Lidar Backbone]: 	48.93895 ms
[⏰ Camera Depth]: 	0.15978 ms
[⏰ Camera Backbone]: 	18.95363 ms
[⏰ Camera Bevpool]: 	1.99872 ms
[⏰ VTransform]: 	2.18362 ms
[⏰ Transfusion]: 	15.96915 ms
[⏰ Head BoundingBox]: 	17.20211 ms
Total: 105.406 ms
=============================================
==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 	0.85197 ms
[⏰ [NoSt] ImageNrom]: 	8.72378 ms
[⏰ Lidar Backbone]: 	18.55526 ms
[⏰ Camera Depth]: 	0.10973 ms
[⏰ Camera Backbone]: 	14.36374 ms
[⏰ Camera Bevpool]: 	1.55261 ms
[⏰ VTransform]: 	1.65760 ms
[⏰ Transfusion]: 	7.19910 ms
[⏰ Head BoundingBox]: 	11.77533 ms
Total: 55.213 ms
=============================================
==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 	0.56618 ms
[⏰ [NoSt] ImageNrom]: 	5.23469 ms
[⏰ Lidar Backbone]: 	20.24752 ms
[⏰ Camera Depth]: 	0.10736 ms
[⏰ Camera Backbone]: 	12.72925 ms
[⏰ Camera Bevpool]: 	1.56182 ms
[⏰ VTransform]: 	1.64019 ms
[⏰ Transfusion]: 	8.37146 ms
[⏰ Head BoundingBox]: 	10.38650 ms
Total: 55.044 ms
=============================================
==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 	1.48154 ms
[⏰ [NoSt] ImageNrom]: 	5.11184 ms
[⏰ Lidar Backbone]: 	19.40304 ms
[⏰ Camera Depth]: 	0.10710 ms
[⏰ Camera Backbone]: 	15.52259 ms
[⏰ Camera Bevpool]: 	1.52963 ms
[⏰ VTransform]: 	1.67616 ms
[⏰ Transfusion]: 	8.33219 ms
[⏰ Head BoundingBox]: 	14.06886 ms
Total: 60.640 ms
=============================================
==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 	0.38928 ms
[⏰ [NoSt] ImageNrom]: 	3.36355 ms
[⏰ Lidar Backbone]: 	18.56685 ms
[⏰ Camera Depth]: 	0.10810 ms
[⏰ Camera Backbone]: 	14.83594 ms
[⏰ Camera Bevpool]: 	1.55382 ms
[⏰ VTransform]: 	1.70230 ms
[⏰ Transfusion]: 	7.22714 ms
[⏰ Head BoundingBox]: 	14.19894 ms
Total: 58.193 ms
=============================================
==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 	0.38029 ms
[⏰ [NoSt] ImageNrom]: 	5.25728 ms
[⏰ Lidar Backbone]: 	17.51168 ms
[⏰ Camera Depth]: 	0.10851 ms
[⏰ Camera Backbone]: 	15.91882 ms
[⏰ Camera Bevpool]: 	1.55072 ms
[⏰ VTransform]: 	1.69590 ms
[⏰ Transfusion]: 	7.21994 ms
[⏰ Head BoundingBox]: 	12.86070 ms
Total: 56.866 ms
=============================================

What steps I have to follow to get this 25 FPS?. I am using a Jetson AGX Orin Developer Kit with this versions:

  • Jetpack: 5.2.1
  • L4T: 35.4.1
  • CUDA: 11.4.315
  • cuDNN: 8.6.0.166
  • TensorRT: 8.5.2.2 And the output is generated following the steps of the repository with the example-data. Thanks you so much.

dav695 avatar Sep 08 '23 11:09 dav695

You should check your device freq. For example: run jetson_clocks.

hopef avatar Sep 26 '23 02:09 hopef

Hello, thanks you for your answer. I have been doing some tests, but I do not Know how to continue with the configuration of the frequency. I think that everything seems to be okay:

sudo jetson_clocks --show
SOC family:tegra234  Machine:Jetson AGX Orin Developer Kit
Online CPUs: 0-11
cpu0:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu1:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu2:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu3:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu4:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu5:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu6:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu7:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu8:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu9:  Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu10: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
cpu11: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0 
GPU MinFreq=1300500000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=204000000 MaxFreq=3199000000 CurrentFreq=3199000000 FreqOverride=1
DLA0_CORE:   Online=1 MinFreq=0 MaxFreq=1600000000 CurrentFreq=1600000000
DLA0_FALCON: Online=1 MinFreq=0 MaxFreq=844800000 CurrentFreq=844800000
DLA1_CORE:   Online=1 MinFreq=0 MaxFreq=1600000000 CurrentFreq=1600000000
DLA1_FALCON: Online=1 MinFreq=0 MaxFreq=844800000 CurrentFreq=844800000
PVA0_VPS0: Online=1 MinFreq=0 MaxFreq=1152000000 CurrentFreq=1152000000
PVA0_AXI:  Online=1 MinFreq=0 MaxFreq=832000000 CurrentFreq=832000000
FAN Dynamic Speed control=active hwmon2_pwm1=56
NV Power Mode: MAXN

And the mean time on the terminal output is:

Mean: 54.629 ms

But when I save the output on a txt instead of visualizing on the terminal , the time is reduced:

Mean: 48.823 ms

Here is my jtop configuration when jetson_clock is running. jtop_output

I think that I am near this 40 ms (25FPS), but something of the configuration is missing in my case.

dav695 avatar Sep 27 '23 07:09 dav695

Hi, may I ask what the mAP you got? I can only get 0.5753 mAP after I run test-mAP-for-cuda.py and don't know why. Thank you so much!

YueSun0609 avatar Oct 10 '23 01:10 YueSun0609

similar issue, i got 60ms+ when running on ORIN and using the author's instructions. Lidar Backbone and camera backbone cost to much time. Can you guys give me some points to check? Thanks @dav695 @hopef

[⏰ [NoSt] CopyLidar]: 0.40496 ms [⏰ [NoSt] ImageNrom]: 0.59677 ms [⏰ Lidar Backbone]: 28.70608 ms [⏰ Camera Depth]: 3.53670 ms [⏰ Camera Backbone]: 8.67360 ms [⏰ Camera Bevpool]: 1.45706 ms [⏰ VTransform]: 1.74602 ms [⏰ Transfusion]: 7.06768 ms [⏰ Head BoundingBox]: 10.52410 ms

Total: 61.782 ms

GeLink9999 avatar Nov 09 '23 07:11 GeLink9999

You should make sure your device is running on MAXN mode.

sudo nvpmodel -q
NV Power Mode:  MAXN

Change to MAXN can use the command: sudo nvpmodel -m 0.

hopef avatar Nov 09 '23 08:11 hopef

You should make sure your device is running on MAXN mode.

sudo nvpmodel -q
NV Power Mode:  MAXN

Change to MAXN can use the command: sudo nvpmodel -m 0.

I have try to use sudo nvpmodel -m 0 to change the power model to MAXN,but it seems to be failed: sudo nvpmodel -m 0 NVPM ERROR: null input file! NVPM ERROR: Failed to parse pm.conf

i also compare the same model and code btw x86-based computer (total time cost is ~34ms) and orin (total time cost is ~62ms) , the sms and memory clock rate show big diff: [11/09/2023-10:12:43] [I] === Device Information === [11/09/2023-10:12:43] [I] Selected Device: NVIDIA GeForce RTX 3060 Laptop GPU [11/09/2023-10:12:43] [I] Compute Capability: 8.6 [11/09/2023-10:12:43] [I] SMs: 30 [11/09/2023-10:12:43] [I] Compute Clock Rate: 1.425 GHz [11/09/2023-10:12:43] [I] Device Global Memory: 5946 MiB [11/09/2023-10:12:43] [I] Shared Memory per SM: 100 KiB [11/09/2023-10:12:43] [I] Memory Bus Width: 192 bits (ECC disabled) [11/09/2023-10:12:43] [I] Memory Clock Rate: 7.001 GHz


[11/08/2023-09:31:00] [I] === Device Information === [11/08/2023-09:31:00] [I] Selected Device: Orin [11/08/2023-09:31:00] [I] Compute Capability: 8.7 [11/08/2023-09:31:00] [I] SMs: 16 [11/08/2023-09:31:00] [I] Compute Clock Rate: 1.275 GHz [11/08/2023-09:31:00] [I] Device Global Memory: 24845 MiB [11/08/2023-09:31:00] [I] Shared Memory per SM: 164 KiB [11/08/2023-09:31:00] [I] Memory Bus Width: 128 bits (ECC disabled) [11/08/2023-09:31:00] [I] Memory Clock Rate: 1.275 GHz

GeLink9999 avatar Nov 14 '23 01:11 GeLink9999

yeah,my processing is same with yours up to 50ms,but I have chosen the maxn model, I really want to know how to achieve the same fps in the project

super-liuyang avatar Jan 09 '24 12:01 super-liuyang