TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Internal Error (Assertion min_ <= max_ failed.)

Open janbernloehr opened this issue 3 years ago • 13 comments

Description

When converting an ONNX model to TensorRT with int8 calibration, we observe the error

[TensorRT] VERBOSE: Calculating Maxima
[TensorRT] INFO: Starting Calibration.
[TensorRT] INFO:   Calibrated batch 0 in 2.67053 seconds.
[TensorRT] INFO:   Calibrated batch 1 in 2.61013 seconds.
...
[TensorRT] INFO:   Calibrated batch 49 in 2.70075 seconds.
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1775, GPU 6334 (MiB)
[TensorRT] ERROR: 2: [quantization.cpp::DynamicRange::65] Error Code 2: Internal Error (Assertion min_ <= max_ failed.)

This error is new in TensorRT 8. The same ONNX model converts fine on TensorRT 6.0.0 and 7.2.3. Moreover, when a calibration cache exists, then the model can also be converted on TensorRT 8.0.3 / 8.2.0. The error only occurs when converting the model on TRT 8 without a pre-existing calibration cache.

Unfortunately, I cannot share the model right now and would like to extract a minimal example to repro this issue. Any clue on how to identify where (which layer) this error is coming from would be very helpful.

Environment

TensorRT Version: 8.0.3 and 8.2.0 NVIDIA GPU: GTX 1070, RTX 2080 Ti, RTX 3060 Ti NVIDIA Driver Version: 470.74 and 495.29.05 CUDA Version: 11.3 and 11.4 CUDNN Version: 8.2.1 and 8.2.4 Operating System: Ubuntu 20.04 Python Version (if applicable): 3.8 Tensorflow Version (if applicable): 2.7 PyTorch Version (if applicable): - Baremetal or Container (if so, version): -

Relevant Files

Unfortunately, we cannot share the model right now.

Steps To Reproduce

trt_logger = trt.Logger(trt_logger_severity)

# The onnx parser version we currently use requires this to be used
explicit_batch = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

with trt.Builder(trt_logger) as builder, builder.create_network(explicit_batch) as network:
    with trt.OnnxParser(network, trt_logger) as onnx_parser, builder.create_builder_config() as config:
        profile = trt_builder.create_optimization_profile()
        trt_input = trt_network.get_input(0)
        profile.set_shape(trt_input.name, min=[3, 896, 1792], opt=[3, 896, 1792], max=[3, 896, 1792])
        config.add_optimization_profile(profile)

        config.int8_calibrator = int8_calibrator
        config.flags = config.flags | 1 << (int)(trt.BuilderFlag.INT8)

        cuda_engine = builder.build_engine(network, config)

the int8 calibrator implements trt.IInt8EntropyCalibrator2

janbernloehr avatar Nov 22 '21 22:11 janbernloehr

The cause of the issue was a division by zero in the network during int8 calibration. It seems this had no effect prior to TRT 8 but now causes this error. It would have been very helpful if the error message contained more information in particular the layer name, and also that the value is nan.

janbernloehr avatar Nov 27 '21 11:11 janbernloehr

same error CUDA10.1、CUDNN7.5、tensorrt install by pip wheel

Linaom1214 avatar Nov 28 '21 10:11 Linaom1214

same error. convert ultralytics/yolov5 to int8 with tensorrt8 and encountered this error. convert is ok with tensorrt7. any solution?

ShaneYS avatar Apr 08 '22 07:04 ShaneYS

same error CUDA10.1、CUDNN7.5、tensorrt install by pip wheel

hi, do you solved this error?

ShaneYS avatar Apr 08 '22 07:04 ShaneYS

Try IInt8MinMaxCalibrator instead of IInt8EntropyCalibrator

Data-Iab avatar Apr 19 '22 13:04 Data-Iab

Another thing you may want to check is the calibration file of the int8_calibrator. Make sure there's no previous file with the same name, if so, remove it first.

Data-Iab avatar Apr 29 '22 11:04 Data-Iab

@janbernloehr @ShaneYS @Linaom1214 do you have a model to share for developers to reproduce?

oxana-nvidia avatar Jun 23 '22 03:06 oxana-nvidia

@oxana-nvidia I have the same error, also with YOLOv5. Attached is the ONNX file. Using IInt8MinMaxCalibrator as @Data-Iab suggested works, but for the DLA in Jetson devices IInt8EntropyCalibrator2 is required. When using tensorrtx, which directly implements YOLOv5 in C++ with TRT-API IInt8EntropyCalibrator2 works though.

yolov5s.onnx.zip

Maxung avatar Jun 28 '22 08:06 Maxung

@Maxung which TensorRT version are you using? is the issue still present if you use TensorRT 8.4? Which shape are you using? Could you please provide more details about your setup? Please use this template if possible: TensorRT Version: NVIDIA GPU: NVIDIA Driver Version: CUDA Version: CUDNN Version: Operating System:

I've tried to reproduce the issue using polygraphy tool, but no issues detected (TensorRT 8.4, cuda 11.6, Titan RTX):

polygraphy run yolov5s.onnx --trt --int8
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] trt-runner-N0-06/28/22-09:54:11     | Activating and starting inference
[06/28/2022-09:54:12] [TRT] [W] onnx2trt_utils.cpp:367: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     Configuring with profiles: [Profile().add('images', min=[32, 3, 640, 640], opt=[32, 3, 640, 640], max=[32, 3, 640, 640])]
[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: True, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUBLAS_LT', 'CUDNN', 'EDGE_MASK_CONVOLUTIONS', 'JIT_CONVOLUTIONS']
    Safety Restricted    | False
    Refittable           | False
    Calibrator           | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
    Profiles             | 1 profile(s)
[06/28/2022-09:54:57] [TRT] [W] Missing scale and zero-point for tensor onnx::Transpose_338, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
...
[06/28/2022-09:58:51] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[I] Finished engine building in 278.379 seconds
[I] trt-runner-N0-06/28/22-09:54:11
    ---- Inference Input(s) ----
    {images [dtype=float16, shape=(32, 3, 640, 640)]}
[I] trt-runner-N0-06/28/22-09:54:11
    ---- Inference Output(s) ----
    {output [dtype=float16, shape=(32, 25200, 85)]}
[I] trt-runner-N0-06/28/22-09:54:11     | Completed 1 iteration(s) in 71.7 ms | Average inference time: 71.7 ms.
[I] PASSED | Command: polygraphy run yolov5s.onnx --trt --int8

oxana-nvidia avatar Jun 28 '22 17:06 oxana-nvidia

EDIT:

I created a colab with TRT-8.4 and received the same error.

@oxana-nvidia following is the information + script to reproduce the error. I'm also able to run polygraphy run yolov5s.onnx --trt --int8, but not when using real data like COCO. I need a production environment, so I'm sadly not able to test with TRT 8.4.

TensorRT Version: 8.2.1.8 NVIDIA GPU: Jetson AGX Xavier CUDA Version: 10.2 CUDNN Version: 8.2.1.32 Operating System: Jetpack 4.6.2 Python: 3.9.12

import torch
import numpy as np
import cv2
import glob

class CocoImageDataset(torch.utils.data.Dataset):
    def __init__(self, img_dir, img_size=1280, max_images=-1):
        self.max_images = max_images if max_images > 0 else -1
        self.img_path = glob.glob(img_dir + "*")[:self.max_images]
        self.img_size = (img_size, img_size)
        self.stride = 32

    def __len__(self):
        return len(self.img_path)

    def __getitem__(self, idx):
        img = cv2.imread(self.img_path[idx])
        img = letterbox(img, self.img_size, stride=self.stride)[0]
        img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
        img = np.ascontiguousarray(img, dtype=np.float32)
        img /= 255
        return img

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=False, scaleFill=True, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

def load_data():
    coco = CocoImageDataset('cocoval_folder', 640)
    loader = torch.utils.data.DataLoader(coco, 32, shuffle=False, pin_memory=True)
    num_batches = 1
    for batch in loader:
        print(f"Batch {num_batches}")
        yield {"images": batch.cuda().data_ptr()}
        if num_batches == 3:
            break
        num_batches += 1

You need the COCO validation set downloaded and unzipped and provide the folder to CocoImageDataset. Run everything with polygraphy convert yolov5s.onnx --int8 --data-loader-script ./int8dataloader.py -o yolov5s.engine I then get the following same output:

[06/29/2022-13:05:51] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     Configuring with profiles: [Profile().add(images, min=[32, 3, 640, 640], opt=[32, 3, 640, 640], max=[32, 3, 640, 640])]
[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: True, Obey Precision Constraints: False, Strict Types: False
    Tactic Sources       | ['CUBLAS', 'CUDNN']
    Safety Restricted    | False
    Calibrator           | Calibrator(<generator object load_data at 0x7f83734ac0>, BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
    Profiles             | 1 profile(s)
Batch 1
Batch 2
Batch 3
[06/29/2022-13:06:59] [TRT] [E] 2: [quantization.cpp::DynamicRange::70] Error Code 2: Internal Error (Assertion min_ <=
max_ failed. )
[06/29/2022-13:06:59] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

Maxung avatar Jun 29 '22 11:06 Maxung

@Maxung Thanks for detailed repro! I've created an internal bug to investigate. Internal number: 3700165 cc @nvpohanh @jhalakp-nvidia

oxana-nvidia avatar Jun 29 '22 23:06 oxana-nvidia

We are still debugging the issue, no updates about root cause at this point. Our engineering team is suggesting a workaround for now to change the input image type from FP32 to FP16. Specifically, in int8dataloader.py change img = np.ascontiguousarray(img, dtype=np.float32) to img = np.ascontiguousarray(img, dtype=np.float16)

@Maxung please try if it is applicable to your case.

oxana-nvidia avatar Aug 12 '22 18:08 oxana-nvidia

@oxana-nvidia thank you for the follow up, I forgot to mention it here, but I searched for alternatives and when using the Polygraphy tool with the custom data loader it works fine. So I'm not sure where the error is exactly, but for me it solved the problem.

Maxung avatar Aug 12 '22 18:08 Maxung

@Maxung the root cause is indeed the mismatch of your input data type (fp32) and onnx model input type (fp16). If the input data is a numpy array, polygraphy checks whether the input type matches expected input type, but in the case of a data_ptr, polygraphy skips type check and assumes the input data has the same type as the onnx model expected input type. Float value might be processed incorrectly when the calibrator trying to "cast" fp32 data to fp16, which could write NaN value to the memory. The error is exposed in the creation of quantization histogram where it throws an error of NaN value.

yibinl-nvidia avatar Aug 17 '22 00:08 yibinl-nvidia

I have the same question, but my situation in the C++ development environment. The weird point is that errors occur in L4 Gpu, and disappear in T4 Gpu(all in one yolov5 calibrate int8 code and tensorrt is 8.6.1.6).

clveryang avatar Oct 25 '23 07:10 clveryang