TensorRT When I use Python API quantized model with clip-ranges error reporting occurs

Description

The same steps can be generated engine(int8) when using RTX4000(tensorrt 8.0, cuda11.3), but when using Jeston AGX orin(tensorrt 8.4, cuda11.4) reports an error(an illegal memory access was encountered). If the number of quantization layers is reduced, the Jeston can succeed like this：

Set dynamic range of x as [-0.9921314716339111, 0.9921314716339111] Set dynamic range of 1472 as [-5.843513488769531, 5.843513488769531] Set dynamic range of 1485 as [-3.6356899738311768, 3.6356899738311768] Set dynamic range of 1498 as [-4.59166955947876, 4.59166955947876] Set dynamic range of 1520 as [-4.642116069793701, 4.642116069793701] Set dynamic range of 1533 as [-5.632355213165283, 5.632355213165283] Set dynamic range of 1546 as [-5.489744663238525, 5.489744663238525] Set dynamic range of 1557 as [-5.353246688842773, 5.353246688842773] onnx2trt.py:86: DeprecationWarning: Use build_serialized_network instead. engine = builder.build_engine(network, config) [07/07/2022-17:02:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-17:02:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-17:02:27] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-17:02:27] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-17:02:27] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-17:02:27] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) ....... Set dynamic range of x as [-0.9921314716339111, 0.9921314716339111] Set dynamic range of 1472 as [-5.843513488769531, 5.843513488769531] Set dynamic range of 1485 as [-3.6356899738311768, 3.6356899738311768] Set dynamic range of 1498 as [-4.59166955947876, 4.59166955947876] onnx2trt.py:86: DeprecationWarning: Use build_serialized_network instead. engine = builder.build_engine(network, config) success!

Environment

TensorRT Version: 8.4 NVIDIA GPU: Jeston AGX Orin NVIDIA Driver Version: CUDA Version: 11.4 CUDNN Version: Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

-------------- The current device memory allocations dump as below -------------- [0]:4194304 :HybridGlobWriter in reserveMemory: at optimizer/common/globWriter.cpp: 398 idx: 11246 time: 0.0096584 [0x2334a9000]:13483200 :DeviceActivationSize in reserveNetworkTensorMemory: at optimizer/common/tactic/optimizer.cpp: 353 idx: 4 time: 0.00907999 [07/07/2022-11:38:33] [TRT] [W] Requested amount of GPU memory (4194304 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [07/07/2022-11:38:33] [TRT] [W] Skipping tactic 0 due to insuficient memory on requested size of 4194304 detected for tactic 1237784342446422381. [07/07/2022-11:38:33] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered)

[07/07/2022-15:35:17] [TRT] [E] 1: Unexpected exception std::exception [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::allocate::62] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered) [07/07/2022-15:35:17] [TRT] [E] 10: [optimizer.cpp::computeCosts::3826] Error Code 10: Internal Error (Could not find any implementation for node Conv_9 + PWN(PWN(Sigmoid_10), Mul_11).) Traceback (most recent call last): File "onnx2trt.py", line 163, in onnx2trt(args.onnx_path, File "onnx2trt.py", line 89, in onnx2trt f.write(bytearray(engine.serialize())) AttributeError: 'NoneType' object has no attribute 'serialize'

Jul 07 '22 09:07 matudouchen

Sorry I don't get your question here? Can you describe how can I reproduce your error on my side?

Jul 07 '22 13:07 zerollzeng

If the number of quantization layers is reduced, the Jeston can succeed like this

Do you mean you remove some Q/DQ nodes?

Jul 07 '22 13:07 zerollzeng

I'm sorry that not clearly decribe my problem. I try to generate quantized yolov5-6.0s model on the Jeston Orin platform. I used the calibiration file and the onnx which is inplict. I try to figure out the problem and I try to remove some calibration layers in the file (say cut off half of the calibration scalars), the Jeston AGX Orin can successfully quantize yolov5s, The cali file and code like this: file:

{
    "tensorrt": {
        "blob_range": {
            "images": 0.9921314716339111,
            "124": 28.302440643310547,
            "127": 34.797706604003906,
            "130": 10.161649703979492,
            "133": 7.030753135681152,
            "136": 15.052501678466797,
            "141": 20.115360260009766,
            "144": 9.726767539978027,
            "147": 5.102614402770996,
            "150": 1.3136987686157227,
            "153": 5.039309501647949,
            "156": 3.079401731491089,
            "157": 3.166759729385376,
            "160": 4.909318923950195,
            "163": 6.238847255706787,
            "168": 6.120404243469238,
            "171": 4.305610656738281,
            "174": 4.600292682647705,
            "177": 1.7588152885437012,
            "180": 5.849929332733154,
            "183": 2.5014913082122803,
            "184": 2.582460880279541,
            "187": 5.261338233947754,
            "190": 4.004266738891602,
            "191": 4.250340938568115,
            "194": 5.125842571258545,
            "197": 7.007857322692871,
            "202": 7.305993556976318,
            "205": 5.617528438568115,
            "208": 5.44209098815918,
            "211": 3.301093578338623,
            "214": 9.096861839294434,
            "217": 8.292915344238281,
            "222": 8.344144821166992,
            "225": 5.890171527862549,
            "228": 5.448906421661377,
            "229": 5.859041690826416,
            "230": 5.885444641113281,
            "232": 5.8786468505859375,
            "235": 4.684720516204834,
            "238": 5.623222827911377,
            "244": 5.843513488769531,
            "247": 3.6356899738311768,
            "250": 4.59166955947876,
            "257": 4.642116069793701,
            "260": 5.632355213165283,
            "263": 5.489744663238525,
            "269": 5.353246688842773,
            "272": 2.2172038555145264,
            "275": 2.6593825817108154,
            "282": 5.755550861358643,
            "285": 19.508153915405273,
            "289": 5.866626739501953,
            "292": 4.959685325622559,
            "295": 3.8385074138641357,
            "302": 6.417641639709473,
            "305": 21.68535804748535,
            "309": 6.783422946929932,
            "312": 7.433841705322266,
            "315": 6.779964447021484,
            "322": 7.765364646911621,
            "325": 17.314800262451172
        }
    }
}

code:

import tensorrt as trt
import os
import json
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger()
def get_engine(onnx_file_path, engine_file_path=""):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    def build_engine():
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network(
            EXPLICIT_BATCH
        ) as network, builder.create_builder_config() as config, trt.OnnxParser(
            network, TRT_LOGGER
        ) as parser, trt.Runtime(
            TRT_LOGGER
        ) as runtime:
            config.max_workspace_size = 1 << 32  # 4GB
            config.set_flag(trt.BuilderFlag.INT8)
            builder.max_batch_size = 1
            if not os.path.exists(onnx_file_path):
                print(
                    "ONNX file {} not found".format(onnx_file_path)
                )
                exit(0)
            print("Loading ONNX file from path {}...".format(onnx_file_path))
            with open(onnx_file_path, "rb") as model:
                print("Beginning ONNX file parsing")
                if not parser.parse(model.read()):
                    print("ERROR: Failed to parse the ONNX file.")
                    for error in range(parser.num_errors):
                        print(parser.get_error(error))
                    return None
            with open('/home/neil/Downloads/test/YOLO_kuku_clip_ranges.json', 'r') as f:
                dynamic_range = json.load(f)['tensorrt']['blob_range']
            for input_index in range(network.num_inputs):
                input_tensor = network.get_input(input_index)
                if input_tensor.name in dynamic_range:
                    amax = dynamic_range[input_tensor.name]
                    input_tensor.dynamic_range = (-amax, amax)
                    print(f'Set dynamic range of {input_tensor.name} as [{-amax}, {amax}]')
            for layer_index in range(network.num_layers):
                layer = network[layer_index]
                output_tensor = layer.get_output(0)
                if output_tensor.name in dynamic_range:
                    amax = dynamic_range[output_tensor.name]
                    output_tensor.dynamic_range = (-amax, amax)
                    print(f'Set dynamic range of {output_tensor.name} as [{-amax}, {amax}]')
            print("Completed parsing of ONNX file")
            print("Building an engine from file {}; this may take a while...".format(onnx_file_path))
            plan = builder.build_serialized_network(network, config)
            engine = runtime.deserialize_cuda_engine(plan)
            print("Completed creating Engine")
            with open(engine_file_path, "wb") as f:
                f.write(plan)
            return engine
    if os.path.exists(engine_file_path):
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())
    else:
        return build_engine()
def main():
    onnx_file_path = "/home/neil/Downloads/test/v5s_6.0.onnx"
    engine_file_path = "/home/neil/Downloads/test/int8.engine"
    get_engine(onnx_file_path, engine_file_path)
if __name__ == "__main__":
    main()

Jul 08 '22 10:07 matudouchen

我还是不太明白你的问题，你可以用中文:-)

另外为什么你会需要删掉某些scale呢？

Jul 08 '22 13:07 zerollzeng

您好，非常感谢，因为我之前尝试8.4.0版本量化所有的结点会报错 (an illegal memory access was encountered)，所以删除一部分节点的进行了尝试。

除了这个问题之外我尝试使用显式量化的方式使用trtexec工具量化yolov5s模型，在tensorrt8.0、8.2、8.4.1都会量化成功，同样在8.4.0版本发生错误（Could not find any implementation for node model.0.conv.weight + PPQ_Operation_2_quantize_scale_node + Conv_0+PWN(PWN(sigmoid_1), Mul_2).)

我认为这会是版本的一个bug，但是我无法更改tensorrt版本，因为我使用的是jeston平台最新的jetpack包，tensorrt版本固定为8.4.0，您可以使用这个onnx来重现这个问题 yolov5s_QDQ.zip

Jul 13 '22 05:07 matudouchen

@ttyio Is this a known issue?

Jul 13 '22 10:07 zerollzeng

@matudouchen ,

The could not find implementation issue might because some old version trt missing conv + swish kernel implementation. And fixed in newer version TRT.

The error in the main description seems insufficient memory in the system, but I am not sure, have you tried enlarge your swap memory? sample command line for a 4G swap is:

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo /bin/sh -c 'echo  "/swapfile \t none \t swap \t defaults \t 0 \t 0" >> /etc/fstab'
sudo swapon -a

Thanks!

Aug 02 '22 03:08 ttyio

Closing since no response for more than 3 weeks, please reopen if you still have question, thanks!

Sep 19 '22 07:09 ttyio

TensorRT TensorRT copied to clipboard

When I use Python API quantized model with clip-ranges error reporting occurs

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard