Deploy on NVIDIA Jetson using TensorRT and DeepStream SDK

This guide explains how to deploy a trained model into NVIDIA Jetson Platform and perform inference using TensorRT and DeepStream SDK. Here we use TensorRT to maximize the inference performance on the Jetson platform. UPDATED 18 November 2022.

Hardware Verification

We have tested and verified this guide on the following Jetson devices

Before You Start

Make sure you have properly installed JetPack SDK with all the SDK Components and DeepStream SDK on the Jetson device as this includes CUDA, TensorRT and DeepStream SDK which are needed for this guide.

JetPack SDK provides a full development environment for hardware-accelerated AI-at-the-edge development. All Jetson modules and developer kits are supported by JetPack SDK.

There are two major installation methods including,

SD Card Image Method
NVIDIA SDK Manager Method

You can find a very detailed installation guide from NVIDIA official website. Also you can find guides corresponding to the above-mentioned reComputer J1010 and reComputer J2021.

Install Necessary Packages

Step 1. Access the terminal of Jetson device, install pip and upgrade it

sudo apt update
sudo apt install -y python3-pip
pip3 install --upgrade pip

Step 2. Clone the following repo

git clone https://github.com/ultralytics/yolov5

Step 3. Open requirements.txt

cd yolov5
vi requirements.txt

Step 5. Edit the following lines. Here you need to press i first to enter editing mode. Press ESC, then type :wq to save and quit

# torch>=1.7.0
# torchvision>=0.8.1

Note: torch and torchvision are excluded for now because they will be installed later.

Step 6. install the below dependency

sudo apt install -y libfreetype6-dev

Step 7. Install the necessary packages

pip3 install -r requirements.txt

Install PyTorch and Torchvision

We cannot install PyTorch and Torchvision from pip because they are not compatible to run on Jetson platform which is based on ARM aarch64 architecture. Therefore we need to manually install pre-built PyTorch pip wheel and compile/ install Torchvision from source.

Visit this page to access all the PyTorch and Torchvision links.

Here are some of the versions supported by JetPack 4.6 and above.

PyTorch v1.10.0

Supported by JetPack 4.4 (L4T R32.4.3) / JetPack 4.4.1 (L4T R32.4.4) / JetPack 4.5 (L4T R32.5.0) / JetPack 4.5.1 (L4T R32.5.1) / JetPack 4.6 (L4T R32.6.1) with Python 3.6

file_name: torch-1.10.0-cp36-cp36m-linux_aarch64.whl URL: https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl

PyTorch v1.12.0

Supported by JetPack 5.0 (L4T R34.1.0) / JetPack 5.0.1 (L4T R34.1.1) / JetPack 5.0.2 (L4T R35.1.0) with Python 3.8

file_name: torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl URL: https://developer.download.nvidia.com/compute/redist/jp/v50/pytorch/torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl

Step 1. Install torch according to your JetPack version in the following format

wget <URL> -O <file_name>
pip3 install <file_name>

For example, here we are running JP4.6.1 and therefore we choose PyTorch v1.10.0

cd ~
sudo apt-get install -y libopenblas-base libopenmpi-dev
wget https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl -O torch-1.10.0-cp36-cp36m-linux_aarch64.whl
pip3 install torch-1.10.0-cp36-cp36m-linux_aarch64.whl

Step 2. Install torchvision depending on the version of PyTorch that you have installed. For example, we chose PyTorch v1.10.0, which means, we need to choose Torchvision v0.11.1

sudo apt install -y libjpeg-dev zlib1g-dev
git clone --branch v0.11.1 https://github.com/pytorch/vision torchvision
cd torchvision
sudo python3 setup.py install

Here a list of the corresponding torchvision version that you need to install according to the PyTorch version:

PyTorch v1.10 - torchvision v0.11.1
PyTorch v1.12 - torchvision v0.13.0

DeepStream Configuration for YOLOv5

Step 1. Clone the following repo

cd ~
git clone https://github.com/marcoslucianops/DeepStream-Yolo

Step 2. Copy gen_wts_yoloV5.py from DeepStream-Yolo/utils into yolov5 directory

cp DeepStream-Yolo/utils/gen_wts_yoloV5.py yolov5

Step 3. Inside the yolov5 repo, download pt file from YOLOv5 releases (example for YOLOv5s 6.1)

cd yolov5
wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt

Step 4. Generate the cfg and wts files

python3 gen_wts_yoloV5.py -w yolov5s.pt

Note: To change the inference size (defaut: 640)

-s SIZE
--size SIZE
-s HEIGHT WIDTH
--size HEIGHT WIDTH

Example for 1280:

-s 1280
or
-s 1280 1280

Step 5. Copy the generated cfg and wts files into the DeepStream-Yolo folder

cp yolov5s.cfg ~/DeepStream-Yolo
cp yolov5s.wts ~/DeepStream-Yolo

Step 6. Open the DeepStream-Yolo folder and compile the library

cd ~/DeepStream-Yolo
CUDA_VER=11.4 make -C nvdsinfer_custom_impl_Yolo  # for DeepStream 6.1
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo  # for DeepStream 6.0.1 / 6.0

Step 7. Edit the config_infer_primary_yoloV5.txt file according to your model

[property]
...
custom-network-config=yolov5s.cfg
model-file=yolov5s.wts
...

Step 8. Edit the deepstream_app_config file

...
[primary-gie]
...
config-file=config_infer_primary_yoloV5.txt

Step 9. Change the video source in deepstream_app_config file. Here a default video file is loaded as you can see below

...
[source0]
...
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4

Run the Inference

deepstream-app -c deepstream_app_config.txt

The above result is running on Jetson Xavier NX with FP32 and YOLOv5s 640x640. We can see that the FPS is around 30.

INT8 Calibration

If you want to use INT8 precision for inference, you need to follow the steps below

Step 1. Install OpenCV

sudo apt-get install libopencv-dev

Step 2. Compile/recompile the nvdsinfer_custom_impl_Yolo library with OpenCV support

cd ~/DeepStream-Yolo
CUDA_VER=11.4 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo  # for DeepStream 6.1
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo  # for DeepStream 6.0.1 / 6.0

Step 3. For COCO dataset, download the val2017, extract, and move to DeepStream-Yolo folder
Step 4. Make a new directory for calibration images

mkdir calibration

Step 5. Run the following to select 1000 random images from COCO dataset to run calibration

for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
    cp ${jpg} calibration/; \
done

Note: NVIDIA recommends at least 500 images to get a good accuracy. On this example, 1000 images are chosen to get better accuracy (more images = more accuracy). Higher INT8_CALIB_BATCH_SIZE values will result in more accuracy and faster calibration speed. Set it according to you GPU memory. You can set it from head -1000. For example, for 2000 images, head -2000. This process can take a long time.

Step 6. Create the calibration.txt file with all selected images

realpath calibration/*jpg > calibration.txt

Step 7. Set environment variables

export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1

Step 8. Update the config_infer_primary_yoloV5.txt file

From

...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...

To

...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...

Step 9. Run the inference

deepstream-app -c deepstream_app_config.txt

The above result is running on Jetson Xavier NX with INT8 and YOLOv5s 640x640. We can see that the FPS is around 60.

Benchmark results

The following table summarizes how different models perform on Jetson Xavier NX.

Model Name	Precision	Inference Size	Inference Time (ms)	FPS
YOLOv5s	FP32	320x320	16.66	60
	FP32	640x640	33.33	30
	INT8	640x640	16.66	60
YOLOv5n	FP32	640x640	16.66	60

Additional

This tutorial is written by our friends at seeed @lakshanthad and Elaine

Sep 28 '22 06:09 AyushExel

@AyushExel awesome! Should this be renamed to something like NVIDIA Jetson Nano deployment tutorial?

Sep 29 '22 21:09 glenn-jocher

@glenn-jocher yeah. "Nvidia Jetson Nano deployment tutorial sounds good". And maybe just pin or add to wikis?

Sep 29 '22 21:09 AyushExel

@AyushExel awesome, added to wiki. I think I'll add to README also. Are those times in the last table right BTW?

Sep 29 '22 21:09 glenn-jocher

@glenn-jocher yes. It's also mentioned here - https://wiki.seeedstudio.com/YOLOv5-Object-Detection-Jetson/ Screenshot 2022-09-30 at 3 08 15 AM

Sep 29 '22 21:09 AyushExel

thanks for the great documentation @AyushExel I'll give it a try in the next day or two

Oct 06 '22 06:10 barney2074

Hi @AyushExel

For step 4. python3 gen_wts_yoloV5.py -w yolov5s.pt I get an error Illegal instruction (core dumped)

I'm using a Seeed reComputer J1010 (Jetson Nano) with Jetpack 4.6.2 and I've tried a couple of times with a fresh flash of the Jetson each time.

I noticed that YoloV5 requires Python 3.7, whereas Jetpack 4.6.2 includes Python 3.6.9, so I used YoloV5 v6.0 (and v6.2 initially)

EDIT: also tried JP4.6.1 (same result)

thanks in advance

Andrew

Oct 11 '22 05:10 barney2074

@lakshanthad do you know what's causing this?

Oct 11 '22 07:10 AyushExel

thanks @AyushExel

I've found the crash report (which I can send to you or @lakshanthad) It's pretty big- around 9mb

I also noticed the SeeedStudio article here: similar/the same? https://wiki.seeedstudio.com/YOLOv5-Object-Detection-Jetson/ I haven't tried this yet- its a bit more complicated

The first 10 lines are:

ProblemType: Crash
Architecture: arm64
CrashCounter: 1
Date: Tue Oct 11 18:06:08 2022
DistroRelease: Ubuntu 18.04
ExecutablePath: /usr/bin/python3.6
ExecutableTimestamp: 1656503157
ProcAttrCurrent: Error: [Errno 22] Invalid argument
ProcCmdline: python3 gen_wts_yoloV5.py -w yolov5s.pt
ProcCwd: /home/nano/yolov5

Oct 11 '22 08:10 barney2074

Hello @barney2074,

Can I know exactly after which command you encounter this crash? And please attach the report here if possible.

I also noticed the SeeedStudio article here: similar/the same? https://wiki.seeedstudio.com/YOLOv5-Object-Detection-Jetson/

Yes. Most content on this GitHub is based on that wiki. That wiki mainly explains the entire process from labeling to deploying

Oct 11 '22 12:10 lakshanthad

Hi @lakshanthad

This occurs after the command **python3 gen_wts_yoloV5.py -w yolov5s.pt** I've put the crash report here: https://drive.google.com/drive/folders/14bu_dNwQ9VbBLMKDBw92t0vUc3e9Rh00?usp=sharing

I have also tried the Seeed wiki- I'll put outcome in a separate post to avoid confusing the issue

thanks

Andrew

Oct 11 '22 23:10 barney2074

Hi @lakshanthad

At step 19 of the Seeed wiki (serialising the model) I get the following error: I've tried a few different models, including https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt and some custom ones

nano@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s best.wts best.engine n6
Loading weights: best.wts
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: (Unnamed Layer* 0) [Convolution]:kernel weights has count 3456 but 1728 was expected
[10/12/2022-09:55:05] [E] [TRT] 4: (Unnamed Layer* 0) [Convolution]: count of 3456 weights in kernel, but kernel dimensions (6,6) with 3 input channels, 16 output channels and 1 groups were specified. Expected Weights count is 3 * 6*6 * 16 / 1 = 1728
[10/12/2022-09:55:05] [E] [TRT] 4: [convolutionNode.cpp::computeOutputExtents::39] Error Code 4: Internal Error ((Unnamed Layer* 0) [Convolution]: number of kernel weights does not match tensor dimensions)
[10/12/2022-09:55:05] [E] [TRT] 3: [network.cpp::addScale::737] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addScale::737, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
)
yolov5: /home/nano/tensorrtx/yolov5/common.hpp:153: nvinfer1::IScaleLayer* addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string<char>, nvinfer1::Weights>&, nvinfer1::ITensor&, std::__cxx11::string, float): Assertion `scale_1' failed.
Aborted
nano@ubuntu:~/tensorrtx/yolov5/build$

Oct 11 '22 23:10 barney2074

@barney2074 I had the same issue too in Jetson-nano b01 dev. It is solved by setting the following environment variable: export OPENBLAS_CORETYPE=ARMV8

But after that, a problem occurs when building deepstream.

dinobei@dinobei-desktop:~/DeepStream-Yolo$ CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
make: Entering directory '/home/dinobei/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo'
g++ -c  -o yolo.o -Wall -std=c++11 -shared -fPIC -Wno-error=deprecated-declarations -I/opt/nvidia/deepstream/deepstream/sources/includes -I/usr/local/cuda-10.2/include yolo.cpp
In file included from yolo.cpp:26:0:
yolo.h:44:10: fatal error: nvdsinfer_custom_impl.h: No such file or directory
 #include "nvdsinfer_custom_impl.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:70: recipe for target 'yolo.o' failed
make: *** [yolo.o] Error 1
make: Leaving directory '/home/dinobei/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo'

Oct 14 '22 03:10 dinobei

Thanks @dinobei

One step forward.... I'm not sure if deploying Yolov5 models on Jetson hardware is inherently tricky- but from my perspective, it would be great if there was an easier path.

Andrew

Oct 14 '22 03:10 barney2074

@lakshanthad do you know what's happening in this error? I seems like its originating from deepstream-yolo module. Is there a way to run this without that? @barney2074 I haven't had time to try it out on my nano yet so I'm not of much help here. I'll try it out soon

Oct 14 '22 03:10 AyushExel

Hello, Sorry for the late reply. Can I know how DeepStream was installed in the first place? @dinobei @barney2074. Sometimes improper DeepStream installations can cause errors later on.

It is recommended to choose it inside NVIDIA SDK Manager when installing JetPack. Because this ensures that there will be no compatibility or missing dependency issues. 1665720652316

Oct 14 '22 04:10 lakshanthad

Thanks @dinobei

One step forward.... I'm not sure if deploying Yolov5 models on Jetson hardware is inherently tricky- but from my perspective, it would be great if there was an easier path.

Andrew

Well. If you just want to deploy, you can use the pre-trained PyTorch model to perform the inference. In this case, follow until and including the Install PyTorch and Torchvision section in the above guide. After that, execute python detect.py --source <video_source>. But the goal of this document is to use TensorRT to increase performance on the Jetson platform. And to use TensorRT with a video stream, DeepStream SDK is used.

So there are 2 ways of deployment on Jetson.

Without TensorRT
With TensoRT and DeepStream SDK

The first method is the fastest deployment. However, the second method ensures the model performance is better on the Jetson hardware compared with the first method.

Oct 14 '22 04:10 lakshanthad

I think this document can be divided into two.

Without TensorRT (fastest deployment)
With TensorRT and DeepStream SDK (takes some time to deploy)

Any suggestions? I can work to reorganize it as above and update this guide.

Oct 14 '22 04:10 lakshanthad

@lakshanthad thank you for reply. What about TensorRT without DeepStream? Is using TensorRT and DeepStream SDKs faster than using TensorRT alone? (model performance)

Oct 14 '22 06:10 dinobei

Hello, Sorry for the late reply. Can I know how DeepStream was installed in the first place? @dinobei @barney2074. Sometimes improper DeepStream installations can cause errors later on.

It is recommended to choose it inside NVIDIA SDK Manager when installing JetPack. Because this ensures that there will be no compatibility or missing dependency issues.

I made a huge mistake. I didn't install DeepStream SDK. I thought DeepStream-Yolo and DeepStream SDK are the same. Currently, JetPack was installed by SDcard image method, I will try reinstalling it with NVIDIA SDK Manager and share the results.

Oct 14 '22 06:10 dinobei

Can I know how DeepStream was installed in the first place

Hi @lakshanthad I installed using SDKManager, and did an OS flash at the same time i.e a completely 'fresh' system.

I'm aiming to get my custom YoloV5 model running on the Jetson, although I tried yolov5s.pt as a test to try to eliminate the problem i.e it is not just my custom model

Just to clarify my understanding: the TensorRT .engine needs to be generated on the same processor architecture as used for inferencing. i.e can't generate it on an x86/RTX machine and run inferencing on an ARM (Jetson) one ??

Andrew

Oct 14 '22 10:10 barney2074

Hello, Sorry for the late reply. Can I know how DeepStream was installed in the first place? @dinobei @barney2074. Sometimes improper DeepStream installations can cause errors later on. It is recommended to choose it inside NVIDIA SDK Manager when installing JetPack. Because this ensures that there will be no compatibility or missing dependency issues.

I made a huge mistake. I didn't install DeepStream SDK. I thought DeepStream-Yolo and DeepStream SDK are the same. Currently, JetPack was installed by SDcard image method, I will try reinstalling it with NVIDIA SDK Manager and share the results.

Yes. Please try again and share your results.

Oct 14 '22 18:10 lakshanthad

Can I know how DeepStream was installed in the first place

Hi @lakshanthad I installed using SDKManager, and did an OS flash at the same time i.e a completely 'fresh' system.

I'm aiming to get my custom YoloV5 model running on the Jetson, although I tried yolov5s.pt as a test to try to eliminate the problem i.e it is not just my custom model

Just to clarify my understanding: the TensorRT .engine needs to be generated on the same processor architecture as used for inferencing. i.e can't generate it on an x86/RTX machine and run inferencing on an ARM (Jetson) one ??

Andrew

Yes, you are right. The .engine file should be generated on the same processor architecture as used for inferencing. It also means serializing and deserializing should be done on the same architecture. When you use DeepStream SDK as mentioned in this guide, after you run deepstream-app -c deepstream_app_config.txt, it will first serialize the model (generate .engine) and then after sometime deserialize the model to do the inferencing.

However, the guide that you found out on Seeed wiki that you mentioned earlier, when only TensorRT is used without DeepStream SDK, you need to manually do this serialize and deserialize work.

Coming back to the issues you are still facing, is any of the issues you mentioned before solved, or do they still exist?

Could we debug like this? First try without TensorRT.

At the beginning of this GitHub page, go through Install Necessary Packages and Install PyTorch and Torchvision
Execute python3 detect.py --source <video_source> that will use yolov5s.pt as the default model for inference

Please let me know whether this works at first.

Oct 14 '22 18:10 lakshanthad

@lakshanthad thank you for reply. What about TensorRT without DeepStream? Is using TensorRT and DeepStream SDKs faster than using TensorRT alone? (model performance)

There is no big difference. The way to use only TensorRT is this. However, there is no example present to view detection on real-time video. The repo only supports image inferencing at the moment. DeepStream SDK comes with real-time video detection support. However, if you are comfortable with maybe OpenCV, it could be possible to grab the video frames as images using OpenCV and do the inferencing while only using the TensorRT Github mentioned before.

Oct 14 '22 18:10 lakshanthad

@glenn-jocher yeah. "Nvidia Jetson Nano deployment tutorial sounds good". And maybe just pin or add to wikis?

@glenn-jocher @AyushExel Could we change the title to "NVIDIA Jetson Platform Deployment"? It is better to have a common name rather than only "Jetson Nano".

Thank you.

Oct 14 '22 18:10 lakshanthad

2. Execute python3 detect.py --source <video_source> that will use yolov5s.pt as the default model for inference

Hi @lakshanthad

Yes, detect.py does work (real slow....) on my Jetson Nano

As I noted before- YoloV5 v6.2 requires Python >= 3.7.0, so I used YoloV5 v6.0 git clone https://github.com/ultralytics/yolov5 --branch v6.0

thanks

Andrew

Oct 16 '22 23:10 barney2074

Using Jetpack 4.6.2 on the Jetson Nano. Faced the same issue as @barney2074 despite installing everything with the NVIDIA SDK Manager. The core dumped error starts popping up when installing torchvision. Unfortunately, the fix suggested by @dinobei did not work for me.

What did work for me, however, was downgrading Numpy from 1.19.5 to 1.19.4. I do not remember where exactly I read something about that creating problems, so cannot provide a source, but it worked. Posting this here incase someone else runs into the same issue.

Oct 26 '22 23:10 adityatandon

hi @adityatandon

many thanks for the info- I don't have the device to hand, but will try it next week & report back

Andrew

Oct 27 '22 00:10 barney2074

@glenn-jocher yeah. "Nvidia Jetson Nano deployment tutorial sounds good". And maybe just pin or add to wikis?

@glenn-jocher @AyushExel Could we change the title to "NVIDIA Jetson Platform Deployment"? It is better to have a common name rather than only "Jetson Nano".

Thank you.

Didn't see this before. I'll make a PR to do this

Oct 27 '22 04:10 AyushExel

Hi @lakshanthad @AyushExel

I've revisited this now that I've got some more time & also a different device (proper dev kit version, rather than the Seeed version with limited memory)

I have made some progress- I had to vary from the instructions a bit to get this far & have taken some notes- I can provide this if it helps

The current status is:

deepstream app working with yolov5 model. I've played around with it and go it working with a camera rather than mp4. I wouldn't say the performance is brilliant (around 5fps at 640x480)
running yolov5 directly does work, but is incredibly slow. It seems to recognise the GPU- but does not use it at all. In fact, inferencing with the CPU is faster- refer below screenshot. Is this what you would expect ??

It would be great to have a tutorial on editing the deepstream config to use a custom yolov5 model I think converting to .engine is fairly clear using export.py but the it looks like the settings in the config file, label file etc need to be altered

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov5s.cfg
model-file=yolov5s.wts
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

Nov 11 '22 06:11 barney2074

Hello everyone,

Thank you @AyushExel and @glenn-jocher, it is a great tutorial about yolov5 on Jetson devices. I am running my Jetson Orin on Jetpack 5.0.1-b118 and CUDA11.4. I have pull the docker of Yolove-latest-arm64. The training on the docker is working good, but it can only train on CPU. I have tried to set --device=0 (for GPU on Orin). However, the docker cannot detect CUDA on the Orin. I have tried to uninstall the newest Pytorch version and try with the PyTorch v1.12.0 and install by the .whl file. However, the docker container with pytorch still cannot define the CUDA on the Orin. Do you have any idea or thought about this problem? Thank you so much.

Nov 15 '22 23:11 Iongng198

yolov5
yolov5 copied to clipboard

NVIDIA Jetson Nvidia Jetson Nano, Xavier NX, Orin Deployment tutorial

Deploy on NVIDIA Jetson using TensorRT and DeepStream SDK

Hardware Verification

Before You Start

Install Necessary Packages

Install PyTorch and Torchvision

DeepStream Configuration for YOLOv5

Run the Inference

INT8 Calibration

Benchmark results

Additional

yolov5 yolov5 copied to clipboard

NVIDIA Jetson Nvidia Jetson Nano, Xavier NX, Orin Deployment tutorial

Deploy on NVIDIA Jetson using TensorRT and DeepStream SDK

Hardware Verification

Before You Start

Install Necessary Packages

Install PyTorch and Torchvision

DeepStream Configuration for YOLOv5

Run the Inference

INT8 Calibration

Benchmark results

Additional

yolov5
yolov5 copied to clipboard