Fail to get INT8 engine with Deepstream7.1
Hi, I tried to get yolov8 INT8 engine file by following steps from https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/INT8Calibration.md. However, I getting this error "ERROR: [TRT]: [checkSanity.cpp::checkLinks::218] Error Code 2: Internal Error (Assertion item.second != nullptr failed. region should have been removed from Graph::regions) Segmentation fault (core dumped)" when running command "deepstream-app -c deepstream_app_config.txt". Could you help me on this issue?
Here are the configuration that I have run with:
- Deepstream 7.1
- TensorRT 10.3
- CUDA 12.6
same issue,and I found https://github.com/ultralytics/ultralytics/issues/15806.It says that downgrade TensorRT to 8.6.1.6 works,so maybe yolo int8 models are incompatible with TensorRT 10.X...
I didn't get issues running INT8 calibration with DeepStream 7.1 here
same issue, Building the TensorRT Engine
ERROR: [TRT]: [checkSanity.cpp::checkLinks::218] Error Code 2: Internal Error (Assertion item.second != nullptr failed. region should have been removed from Graph::regions)
same config as above.
Deepstream 7.1
TensorRT 10.3
CUDA 12.6
Runing on jetson AGX Orin
Opencv 4.10
Ps I tested few weeks ago on jetson xavier nx with no issues
Which model are you using? Can you send the full log?
I'm running Yolov8. the issue is present only for INT8 Calibration.
rapit@ubuntu:~/DeepStream-Yolo$ deepstream-app -c deepstream_app_config.txt Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode. WARNING: Deserialize engine failed because file path: /home/rapit/DeepStream-Yolo/model_b1_gpu0_int8.engine open error 0:00:00.178367820 2848 0xaaaafd225c70 WARN nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2080> [UID = 1]: deserialize engine from file :/home/rapit/DeepStream-Yolo/model_b1_gpu0_int8.engine failed 0:00:00.178429612 2848 0xaaaafd225c70 WARN nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2185> [UID = 1]: deserialize backend context from engine from file :/home/rapit/DeepStream-Yolo/model_b1_gpu0_int8.engine failed, try rebuild 0:00:00.178449676 2848 0xaaaafd225c70 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2106> [UID = 1]: Trying to create engine from model files WARNING: INT8 calibration file not specified/accessible. INT8 calibration can be done through setDynamicRange API in 'NvDsInferCreateNetwork' implementation
Building the TensorRT Engine
File does not exist: /home/rapit/DeepStream-Yolo/calib.table ERROR: [TRT]: [checkSanity.cpp::checkLinks::218] Error Code 2: Internal Error (Assertion item.second != nullptr failed. region should have been removed from Graph::regions) Segmentation fault (core dumped)
[primary-gie] enable=1 gpu-id=0 gie-unique-id=1 nvbuf-memory-type=0 config-file=config_infer_primary_yoloV8.txt
GNU nano 6.2 config_infer_primary_yoloV8.txt
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=yolov8m.pt.onnx
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=1
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
#workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all] nms-iou-threshold=0.45 pre-cluster-threshold=0.25 topk=300
Looks like it's a issue with the TRT 10.3 for Jetson boards. I don't have a Orin to debug, so it's hard to check this issue.
Hi, I'm putting my issue in this thread because it is related, I'm running YOLOX model without int8 calibration and I also have an issue on jetson orin board with TRT 10.3:
ERROR: [TRT]: IBuilder: :buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node /0/backbone/stem/conv/conv/Conv.) Segmentation fault (core dumped)
Can this error help in any way? Should we try implementing lower versions? Unfortunately, this requires reflash of the board.
I'll try in the next days on a desktop gpu to see how it runs.
@Foglia-m did you convert the pth model with the utils/export_yolox.py?
@marcoslucianops Yes I did! I also trained a yolov8 and managed to build the tensorrt engine on deepstream.
For the yolox I also tried setting a higher opset number when converting to onnx but it did not work out either.
@Foglia-m I don't have Orin board to test, so it's hard to debug this issue.
@marcoslucianops I did manage to build the tensor engine using deepstream on a laptop, but I'm still failing on the jetson.
Hey guys,
I can confirm that after reflashing with JetPack6.0 which includes TensorRT 8.6.2, INT8 engine generation works as expected.
The issue is only there when using JetPack6.1 which includes TensorRT 10.3.
Hi, I also experienced this problem with tensorRT 10.3, I finally fixed it by installing tensorRT 10.4. I used the tensorRT tar file and overwrote each file, of course after backing up the default jetpack files. The DLA also works.
Hi, I also experienced this problem with tensorRT 10.3, I finally fixed it by installing tensorRT 10.4. I used the tensorRT tar file and overwrote each file, of course after backing up the default jetpack files. The DLA also works.
I saw that you managed to install TensorRT 10.4 manually on your Jetson board, which is quite uncommon given the tightly integrated JetPack environment. Could you please share a detailed description of your process? It's helpful for anyone trying to overcome similar INT8 calibration issues with TRT 10.3. Thank you!
Hi, I don't remember in detail what it does, but the first time I downloaded the .tar file from tensorRT and replaced each file in it with the corresponding one in the file system. For this I searched one by one and backed them up. I also remember to correct some symbolic links. A tedious and crafty job, I only mention it because it worked for me that time.
But the last few times I had to upgrade tensorRT again, at least with jetpack 6.2 I found a much simpler and faster solution.
- wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
- sudo dpkg -i cuda-keyring_1.1-1_all.deb
- sudo apt-get update
- (optional) sudo apt-get -y install cuda-toolkit-12-6 cuda-compat-12-6
- sudo apt install tensorrt
This way, it installs, now the latest available version 10.7.0.23 of tensorRT, and I had no problems to use int8 and even DLA with yolov8 models.
Pd: There is also, other posibility adding local repository. You can install the deb to force the local repository following the steps in the documentation:
os="ubuntuxx04”
tag="10.x.x-cuda-x.x”
sudo dpkg -i nv-tensorrt-local-repo-${os}-${tag}_1.0-1_amd64.deb
sudo cp /var/nv/nv-tensorrt-local-repo-${os}-${tag}-${tag}/*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
If this does not work we can go directly to the directory /var/nv/nv-tensorrt-local-repo-${os}-${tag} and install with dpkg -i *.db all the .deb
Thanks a lot for sharing your approach! The manual file replacement sounds tedious indeed, but it's great to know it worked. The newer method with apt install tensorrt is much cleaner. Thanks again :) and really appreciate the detailed explanation!
Iit's an issue with the Orin and TRT 10.3. I will be able to fix it when I get one of the Orin boards to debug.
可能你并不需要去买个Orin board,因为这不是特例,我在Intel13900-RTX4090上的deepstream7.1的容器里,也遇到了同样的错误。我去升级tensorrt,看看能不能解决该问题。
Iit's an issue with the Orin and TRT 10.3. I will be able to fix it when I get one of the Orin boards to debug.
calib.table在deepstream7.1上的问题,我已经解决了,我的办法是,先用python带来生成1个calib.table,然后再使用这个calib.table来编译成engine。代码如下:import tensorrt as trt import numpy as np import pycuda.driver as cuda import pycuda.autoinit import cv2 import os
ONNX_PATH = "v8s_640_p234.onnx" CACHE_PATH = "calib.table" CALIB_DIR = "../calibration_data/" BATCH_SIZE = 64 INPUT_SHAPE = (3, 640, 640)
class Calibrator(trt.IInt8EntropyCalibrator2): def init(self, img_dir, batch_size, input_shape): super().init() # Required for TensorRT calibrator classes self.img_paths = [os.path.join(img_dir, f) for f in os.listdir(img_dir) if f.endswith(('.jpg', '.png'))] self.batch_size = batch_size self.input_shape = input_shape self.current_index = 0 self.device_input = cuda.mem_alloc(trt.volume(input_shape) * batch_size * np.float32().nbytes)
def get_batch_size(self):
return self.batch_size
def get_batch(self, names):
if self.current_index + self.batch_size > len(self.img_paths):
return None
batch = np.zeros((self.batch_size, *self.input_shape), dtype=np.float32)
for i in range(self.batch_size):
img = cv2.imread(self.img_paths[self.current_index + i])
img = cv2.resize(img, (self.input_shape[2], self.input_shape[1]))
img = img.transpose(2, 0, 1) / 255.0
batch[i] = img
self.current_index += self.batch_size
cuda.memcpy_htod(self.device_input, batch)
return [int(self.device_input)]
def read_calibration_cache(self):
if os.path.exists(CACHE_PATH):
with open(CACHE_PATH, "rb") as f:
return f.read()
def write_calibration_cache(self, cache):
with open(CACHE_PATH, "wb") as f:
f.write(cache)
TRT_LOGGER = trt.Logger(trt.Logger.INFO) builder = trt.Builder(TRT_LOGGER) network_flags = (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) network = builder.create_network(network_flags) parser = trt.OnnxParser(network, TRT_LOGGER)
with open(ONNX_PATH, "rb") as f: parser.parse(f.read())
config = builder.create_builder_config() config.set_flag(trt.BuilderFlag.INT8) config.int8_calibrator = Calibrator(CALIB_DIR, BATCH_SIZE, INPUT_SHAPE)
config.max_workspace_size = 1 << 30
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)
profile = builder.create_optimization_profile() profile.set_shape("input", (1, *INPUT_SHAPE), (BATCH_SIZE, *INPUT_SHAPE), (BATCH_SIZE, *INPUT_SHAPE)) config.add_optimization_profile(profile)
engine = builder.build_serialized_network(network, config)
Calibration cache will be saved as calib.cache
with open("yolov8s.engine", "wb") as f:
f.write(engine)
print("INT8 TensorRT engine saved to yolov8s.engine")
Iit's an issue with the Orin and TRT 10.3. I will be able to fix it when I get one of the Orin boards to debug.
for me this is also producing in docker with 4090, I am also trying to calibrate int8 model.
Iit's an issue with the Orin and TRT 10.3. I will be able to fix it when I get one of the Orin boards to debug.
for me this is also producing in docker with 4090, I am also trying to calibrate int8 model.
I solved it by upgrading tensorrt to 10.12.0 by sudo apt install tensorrt as @jcgassoloncan suggested.
Everyone with this issue, please use the TensorRT 10.4 or newer.
To install the 10.4 version with dGPU:
sudo apt-get install libnvinfer-dev=10.4.0.26-1+cuda12.6 libnvinfer-dispatch-dev=10.4.0.26-1+cuda12.6 libnvinfer-dispatch10=10.4.0.26-1+cuda12.6 libnvinfer-headers-dev=10.4.0.26-1+cuda12.6 libnvinfer-headers-plugin-dev=10.4.0.26-1+cuda12.6 libnvinfer-lean-dev=10.4.0.26-1+cuda12.6 libnvinfer-lean10=10.4.0.26-1+cuda12.6 libnvinfer-plugin-dev=10.4.0.26-1+cuda12.6 libnvinfer-plugin10=10.4.0.26-1+cuda12.6 libnvinfer-vc-plugin-dev=10.4.0.26-1+cuda12.6 libnvinfer-vc-plugin10=10.4.0.26-1+cuda12.6 libnvinfer10=10.4.0.26-1+cuda12.6 libnvonnxparsers-dev=10.4.0.26-1+cuda12.6 libnvonnxparsers10=10.4.0.26-1+cuda12.6 tensorrt-dev=10.4.0.26-1+cuda12.6 libnvinfer-samples=10.4.0.26-1+cuda12.6 libnvinfer-bin=10.4.0.26-1+cuda12.6 libcudnn9-cuda-12=9.3.0.75-1 libcudnn9-dev-cuda-12=9.3.0.75-1
sudo apt-mark hold libnvinfer* libnvparsers* libnvonnxparsers* libcudnn9* python3-libnvinfer* uff-converter-tf* onnx-graphsurgeon* graphsurgeon-tf* tensorrt*
I will update the docs/dGPUInstalation.md soon
@marcoslucianops does this mean we will have to wait for higher TensorRT version support on Jetson devices?
@lakshanthad I didn't test that command in Jetson yet. I can try to check it by next week.
I seem to be having this same issue or similar when trying to run a YOLOX model with INT8 calibration on DeepStream 7.1 with Orin Nano. I have tried both the commands to install TensorRT 10.4 and just updating to the latest with apt install tensorrt, however continue to get errors when the engine starts to build. The engine builds fine when not running with INT8. The error message I get can vary each time, some ones I've had while testing have been:
ERROR: [TRT]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node PWN(/0/backbone/backbone/stem/conv/act/Sigmoid).)
ERROR: [TRT]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node /0/backbone/backbone/stem/conv/conv/Conv.)
ERROR: [TRT]: Unexpected exception _Map_base::at
@Foglia-m I know you weren't doing the INT8 calibration but did you manage to get the YOLOX model working on the Jetson in the end?
I've been able to get YOLO11 to build INT8. Having the same problem with D-FINE and RT-DETR. This ultralytics page was pretty useful for YOLO11