TensorRT
TensorRT copied to clipboard
Internal Error (Assertion min_ <= max_ failed.)
Description
When converting an ONNX model to TensorRT with int8 calibration, we observe the error
[TensorRT] VERBOSE: Calculating Maxima
[TensorRT] INFO: Starting Calibration.
[TensorRT] INFO: Calibrated batch 0 in 2.67053 seconds.
[TensorRT] INFO: Calibrated batch 1 in 2.61013 seconds.
...
[TensorRT] INFO: Calibrated batch 49 in 2.70075 seconds.
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1775, GPU 6334 (MiB)
[TensorRT] ERROR: 2: [quantization.cpp::DynamicRange::65] Error Code 2: Internal Error (Assertion min_ <= max_ failed.)
This error is new in TensorRT 8. The same ONNX model converts fine on TensorRT 6.0.0 and 7.2.3. Moreover, when a calibration cache exists, then the model can also be converted on TensorRT 8.0.3 / 8.2.0. The error only occurs when converting the model on TRT 8 without a pre-existing calibration cache.
Unfortunately, I cannot share the model right now and would like to extract a minimal example to repro this issue. Any clue on how to identify where (which layer) this error is coming from would be very helpful.
Environment
TensorRT Version: 8.0.3 and 8.2.0 NVIDIA GPU: GTX 1070, RTX 2080 Ti, RTX 3060 Ti NVIDIA Driver Version: 470.74 and 495.29.05 CUDA Version: 11.3 and 11.4 CUDNN Version: 8.2.1 and 8.2.4 Operating System: Ubuntu 20.04 Python Version (if applicable): 3.8 Tensorflow Version (if applicable): 2.7 PyTorch Version (if applicable): - Baremetal or Container (if so, version): -
Relevant Files
Unfortunately, we cannot share the model right now.
Steps To Reproduce
trt_logger = trt.Logger(trt_logger_severity)
# The onnx parser version we currently use requires this to be used
explicit_batch = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(trt_logger) as builder, builder.create_network(explicit_batch) as network:
with trt.OnnxParser(network, trt_logger) as onnx_parser, builder.create_builder_config() as config:
profile = trt_builder.create_optimization_profile()
trt_input = trt_network.get_input(0)
profile.set_shape(trt_input.name, min=[3, 896, 1792], opt=[3, 896, 1792], max=[3, 896, 1792])
config.add_optimization_profile(profile)
config.int8_calibrator = int8_calibrator
config.flags = config.flags | 1 << (int)(trt.BuilderFlag.INT8)
cuda_engine = builder.build_engine(network, config)
the int8 calibrator implements trt.IInt8EntropyCalibrator2
The cause of the issue was a division by zero in the network during int8 calibration. It seems this had no effect prior to TRT 8 but now causes this error. It would have been very helpful if the error message contained more information in particular the layer name, and also that the value is nan.
same error CUDA10.1、CUDNN7.5、tensorrt install by pip wheel
same error. convert ultralytics/yolov5 to int8 with tensorrt8 and encountered this error. convert is ok with tensorrt7. any solution?
same error CUDA10.1、CUDNN7.5、tensorrt install by pip wheel
hi, do you solved this error?
Try IInt8MinMaxCalibrator
instead of IInt8EntropyCalibrator
Another thing you may want to check is the calibration file of the int8_calibrator
. Make sure there's no previous file with the same name, if so, remove it first.
@janbernloehr @ShaneYS @Linaom1214 do you have a model to share for developers to reproduce?
@oxana-nvidia I have the same error, also with YOLOv5. Attached is the ONNX file. Using IInt8MinMaxCalibrator as @Data-Iab suggested works, but for the DLA in Jetson devices IInt8EntropyCalibrator2 is required. When using tensorrtx, which directly implements YOLOv5 in C++ with TRT-API IInt8EntropyCalibrator2 works though.
@Maxung which TensorRT version are you using? is the issue still present if you use TensorRT 8.4? Which shape are you using? Could you please provide more details about your setup? Please use this template if possible: TensorRT Version: NVIDIA GPU: NVIDIA Driver Version: CUDA Version: CUDNN Version: Operating System:
I've tried to reproduce the issue using polygraphy tool, but no issues detected (TensorRT 8.4, cuda 11.6, Titan RTX):
polygraphy run yolov5s.onnx --trt --int8
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] trt-runner-N0-06/28/22-09:54:11 | Activating and starting inference
[06/28/2022-09:54:12] [TRT] [W] onnx2trt_utils.cpp:367: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Configuring with profiles: [Profile().add('images', min=[32, 3, 640, 640], opt=[32, 3, 640, 640], max=[32, 3, 640, 640])]
[I] Building engine with configuration:
Workspace | 16777216 bytes (16.00 MiB)
Precision | TF32: False, FP16: False, INT8: True, Obey Precision Constraints: False, Strict Types: False
Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN', 'EDGE_MASK_CONVOLUTIONS', 'JIT_CONVOLUTIONS']
Safety Restricted | False
Refittable | False
Calibrator | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
Profiles | 1 profile(s)
[06/28/2022-09:54:57] [TRT] [W] Missing scale and zero-point for tensor onnx::Transpose_338, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
...
[06/28/2022-09:58:51] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[I] Finished engine building in 278.379 seconds
[I] trt-runner-N0-06/28/22-09:54:11
---- Inference Input(s) ----
{images [dtype=float16, shape=(32, 3, 640, 640)]}
[I] trt-runner-N0-06/28/22-09:54:11
---- Inference Output(s) ----
{output [dtype=float16, shape=(32, 25200, 85)]}
[I] trt-runner-N0-06/28/22-09:54:11 | Completed 1 iteration(s) in 71.7 ms | Average inference time: 71.7 ms.
[I] PASSED | Command: polygraphy run yolov5s.onnx --trt --int8
EDIT:
I created a colab with TRT-8.4 and received the same error.
@oxana-nvidia following is the information + script to reproduce the error. I'm also able to run polygraphy run yolov5s.onnx --trt --int8
, but not when using real data like COCO. I need a production environment, so I'm sadly not able to test with TRT 8.4.
TensorRT Version: 8.2.1.8 NVIDIA GPU: Jetson AGX Xavier CUDA Version: 10.2 CUDNN Version: 8.2.1.32 Operating System: Jetpack 4.6.2 Python: 3.9.12
import torch
import numpy as np
import cv2
import glob
class CocoImageDataset(torch.utils.data.Dataset):
def __init__(self, img_dir, img_size=1280, max_images=-1):
self.max_images = max_images if max_images > 0 else -1
self.img_path = glob.glob(img_dir + "*")[:self.max_images]
self.img_size = (img_size, img_size)
self.stride = 32
def __len__(self):
return len(self.img_path)
def __getitem__(self, idx):
img = cv2.imread(self.img_path[idx])
img = letterbox(img, self.img_size, stride=self.stride)[0]
img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
img = np.ascontiguousarray(img, dtype=np.float32)
img /= 255
return img
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=False, scaleFill=True, scaleup=True, stride=32):
# Resize and pad image while meeting stride-multiple constraints
shape = im.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better val mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return im, ratio, (dw, dh)
def load_data():
coco = CocoImageDataset('cocoval_folder', 640)
loader = torch.utils.data.DataLoader(coco, 32, shuffle=False, pin_memory=True)
num_batches = 1
for batch in loader:
print(f"Batch {num_batches}")
yield {"images": batch.cuda().data_ptr()}
if num_batches == 3:
break
num_batches += 1
You need the COCO validation set downloaded and unzipped and provide the folder to CocoImageDataset.
Run everything with polygraphy convert yolov5s.onnx --int8 --data-loader-script ./int8dataloader.py -o yolov5s.engine
I then get the following same output:
[06/29/2022-13:05:51] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Configuring with profiles: [Profile().add(images, min=[32, 3, 640, 640], opt=[32, 3, 640, 640], max=[32, 3, 640, 640])]
[I] Building engine with configuration:
Workspace | 16777216 bytes (16.00 MiB)
Precision | TF32: False, FP16: False, INT8: True, Obey Precision Constraints: False, Strict Types: False
Tactic Sources | ['CUBLAS', 'CUDNN']
Safety Restricted | False
Calibrator | Calibrator(<generator object load_data at 0x7f83734ac0>, BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
Profiles | 1 profile(s)
Batch 1
Batch 2
Batch 3
[06/29/2022-13:06:59] [TRT] [E] 2: [quantization.cpp::DynamicRange::70] Error Code 2: Internal Error (Assertion min_ <=
max_ failed. )
[06/29/2022-13:06:59] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
@Maxung Thanks for detailed repro! I've created an internal bug to investigate. Internal number: 3700165 cc @nvpohanh @jhalakp-nvidia
We are still debugging the issue, no updates about root cause at this point.
Our engineering team is suggesting a workaround for now to change the input image type from FP32 to FP16. Specifically, in int8dataloader.py change
img = np.ascontiguousarray(img, dtype=np.float32)
to
img = np.ascontiguousarray(img, dtype=np.float16)
@Maxung please try if it is applicable to your case.
@oxana-nvidia thank you for the follow up, I forgot to mention it here, but I searched for alternatives and when using the Polygraphy tool with the custom data loader it works fine. So I'm not sure where the error is exactly, but for me it solved the problem.
@Maxung the root cause is indeed the mismatch of your input data type (fp32) and onnx model input type (fp16). If the input data is a numpy array, polygraphy checks whether the input type matches expected input type, but in the case of a data_ptr
, polygraphy skips type check and assumes the input data has the same type as the onnx model expected input type. Float value might be processed incorrectly when the calibrator trying to "cast" fp32 data to fp16, which could write NaN value to the memory. The error is exposed in the creation of quantization histogram where it throws an error of NaN value.
I have the same question, but my situation in the C++ development environment. The weird point is that errors occur in L4 Gpu, and disappear in T4 Gpu(all in one yolov5 calibrate int8 code and tensorrt is 8.6.1.6).