TensorRT onnx2tensorrt: Fatal Python error: Segmentation fault

Description

[09/22/2022-07:36:20] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Completed parsing ONNX file [09/22/2022-07:36:21] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.2.1 [09/22/2022-07:36:21] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 Fatal Python error: Segmentation fault

Hi, @pbridger @aaronp24 @ttyio @lukeyeager @elezar I can only locate the error in this line of code plan = builder.build_serialized_network(network, config). So how do I solve the problem

Environment

TensorRT Version: 8.2.3 NVIDIA GPU: 3090ti NVIDIA Driver Version: CUDA Version: 11.2 CUDNN Version: 8.05 Operating System: ubuntu18.04 Python Version (if applicable): 3.8.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.8 Baremetal or Container (if so, version):

Sep 22 '22 07:09 ywfwyht

Looks like a usage issue, very likely somewhere in your code is wrong. can you provide a reproduce?

Sep 22 '22 14:09 zerollzeng

@zerollzeng ，thanks so much.

TRT_LOGGER = trt.Logger()
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
if os.path.exists(self.cfg.engine_file_path):
    # If a serialized engine exists, use it instead of building an engine.
    print("Reading engine from file {}".format(self.cfg.engine_file_path))
    with open(self.cfg.engine_file_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())
else:
    with trt.Builder(TRT_LOGGER) as builder, \
        builder.create_network(EXPLICIT_BATCH) as network, \
        builder.create_builder_config() as config, \
        trt.OnnxParser(network, TRT_LOGGER) as parser, \
        trt.Runtime(TRT_LOGGER) as runtime:
        config.max_workspace_size = 1 << 32 
        builder.max_batch_size = 1
        # config.set_flag(trt.BuilderFlag.FP16)
        if not os.path.exists(onnx_file):
            print('ONNX file {} not found.'.format(onnx_file))
            exit(0)
        print('Loading ONNX file from path {} ...'.format(onnx_file))
        with open(onnx_file, 'rb') as model:
            print('Beginning ONNX file parsing')
            if not parser.parse(model.read()):
                print('ERROR: Failed to parse the onnx file.')
                for e in range(parser.num_errors):
                    print(parser.get_error(e))
                return None
        print("Completed parsing ONNX file")
        network.get_input(0).shape = [1, 3, 1152, 1152]
        # 序列化模型
        plan = builder.build_serialized_network(network, config)
        # 反序列化
        engine = runtime.deserialize_cuda_engine(plan)
        print("Completed creating Engine")
        with open(self.cfg.engine_file_path, 'wb') as f:
            f.write(plan)
            print('save trt success ...')
        return engine
            with open(self.cfg.engine_file_path, 'wb') as f:
                f.write(plan)
                # f.write(engine.serialize())
                print('save trt success ...')
            return engine

Sep 23 '22 01:09 ywfwyht

Both polygraphy and Trtexec report this wrong Segmentation fault

Sep 23 '22 01:09 ywfwyht

Do you mean you can reproduce it with trtexec? Can you share the onnx with us? there should be a bug in TRT we need to investigate, it should never seg fault.

Sep 23 '22 02:09 zerollzeng

You need to merge these three ONNX files , https://github.com/ywfwyht/onnx_model

Sep 23 '22 09:09 ywfwyht

Do you mean you can reproduce it with trtexec? Can you share the onnx with us? there should be a bug in TRT we need to investigate, it should never seg fault.

Trtexec must be added with --best. Will be successful . If I do not add --best, the segmentation fault will still be reported

Sep 23 '22 09:09 ywfwyht

You need to merge these three ONNX files , https://github.com/ywfwyht/onnx_model

Can you upload it to Google Drive, or use Git LFS to add those file to your repo?

Sep 23 '22 09:09 zerollzeng

You need to merge these three ONNX files , https://github.com/ywfwyht/onnx_model

Can you upload it to Google Drive, or use Git LFS to add those file to your repo?

Google Drive and Git LFS cannot be used

Sep 23 '22 09:09 ywfwyht

Okay, Can you tell me how to merge these sub-onnx model?

Sep 23 '22 09:09 zerollzeng

Okay, Can you tell me how to merge these sub-onnx model?

I uploaded the full ONNX from the new git LFS，https://github.com/ywfwyht/onnx_model/blob/main/0905_p28_t1_seg.onnx

Sep 26 '22 01:09 ywfwyht

I can reproduce this in 8.2.3. but the issue is fixed in TRT 8.4. I think the error comes from Myelin. cc @jackwish for viz.

[09/26/2022-01:52:03] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -8010679767156598961
[09/26/2022-01:52:03] [V] [TRT] =============== Computing costs for
[09/26/2022-01:52:03] [V] [TRT] *************** Autotuning format combination: Float(1327104,20736,144,1) -> Float(165888,20736,144,1) ***************
[09/26/2022-01:52:03] [V] [TRT] --------------- Timing Runner: {ForeignNode[753 + (Unnamed Layer* 164) [Shuffle]...Reshape_389]} (Myelin)
Segmentation fault (core dumped)
root@1ac28267c26e:# /usr/src/tensorrt/bin/trtexec --onnx=0905_p28_t1_seg.onnx --workspace=16000 --verbose

@ywfwyht the best way to get rid of this error is to upgrade to TRT 8.4 since we won't move the fix to TRT 8.2, can you try it on your side?

Sep 26 '22 01:09 zerollzeng

@zerollzeng ，Have you tested TRT 8.4 ?

Sep 26 '22 02:09 ywfwyht

Yes, Using the official container 22.07 with TRT 8.4.1 works for me.

[09/26/2022-01:41:58] [I] === Performance summary ===
[09/26/2022-01:41:58] [I] Throughput: 23.5252 qps
[09/26/2022-01:41:58] [I] Latency: min = 43.3171 ms, max = 45.5698 ms, mean = 43.8852 ms, median = 43.7786 ms, percentile(99%) = 45.5698 ms
[09/26/2022-01:41:58] [I] Enqueue Time: min = 41.7809 ms, max = 42.952 ms, mean = 42.2847 ms, median = 42.2079 ms, percentile(99%) = 42.952 ms
[09/26/2022-01:41:58] [I] H2D Latency: min = 1.43738 ms, max = 2.53459 ms, mean = 1.51407 ms, median = 1.45654 ms, percentile(99%) = 2.53459 ms
[09/26/2022-01:41:58] [I] GPU Compute Time: min = 41.8014 ms, max = 42.9887 ms, mean = 42.3121 ms, median = 42.2393 ms, percentile(99%) = 42.9887 ms
[09/26/2022-01:41:58] [I] D2H Latency: min = 0.0532227 ms, max = 0.0654297 ms, mean = 0.0590306 ms, median = 0.0585938 ms, percentile(99%) = 0.0654297 ms
[09/26/2022-01:41:58] [I] Total Host Walltime: 3.10306 s
[09/26/2022-01:41:58] [I] Total GPU Compute Time: 3.08878 s
[09/26/2022-01:41:58] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[09/26/2022-01:41:58] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[09/26/2022-01:41:58] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/26/2022-01:41:58] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --onnx=0905_p28_t1_seg.onnx

Sep 26 '22 03:09 zerollzeng

Yes, Using the official container 22.07 with TRT 8.4.1 works for me.

[09/26/2022-01:41:58] [I] === Performance summary ===
[09/26/2022-01:41:58] [I] Throughput: 23.5252 qps
[09/26/2022-01:41:58] [I] Latency: min = 43.3171 ms, max = 45.5698 ms, mean = 43.8852 ms, median = 43.7786 ms, percentile(99%) = 45.5698 ms
[09/26/2022-01:41:58] [I] Enqueue Time: min = 41.7809 ms, max = 42.952 ms, mean = 42.2847 ms, median = 42.2079 ms, percentile(99%) = 42.952 ms
[09/26/2022-01:41:58] [I] H2D Latency: min = 1.43738 ms, max = 2.53459 ms, mean = 1.51407 ms, median = 1.45654 ms, percentile(99%) = 2.53459 ms
[09/26/2022-01:41:58] [I] GPU Compute Time: min = 41.8014 ms, max = 42.9887 ms, mean = 42.3121 ms, median = 42.2393 ms, percentile(99%) = 42.9887 ms
[09/26/2022-01:41:58] [I] D2H Latency: min = 0.0532227 ms, max = 0.0654297 ms, mean = 0.0590306 ms, median = 0.0585938 ms, percentile(99%) = 0.0654297 ms
[09/26/2022-01:41:58] [I] Total Host Walltime: 3.10306 s
[09/26/2022-01:41:58] [I] Total GPU Compute Time: 3.08878 s
[09/26/2022-01:41:58] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[09/26/2022-01:41:58] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[09/26/2022-01:41:58] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/26/2022-01:41:58] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --onnx=0905_p28_t1_seg.onnx

ok, thanks

Sep 26 '22 05:09 ywfwyht

TensorRT TensorRT copied to clipboard

onnx2tensorrt: Fatal Python error: Segmentation fault

Description

Environment

TensorRT
TensorRT copied to clipboard