TensorRT
TensorRT copied to clipboard
onnx2tensorrt: Fatal Python error: Segmentation fault
Description
[09/22/2022-07:36:20] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Completed parsing ONNX file [09/22/2022-07:36:21] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.2.1 [09/22/2022-07:36:21] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 Fatal Python error: Segmentation fault
Hi, @pbridger @aaronp24 @ttyio @lukeyeager @elezar I can only locate the error in this line of code plan = builder.build_serialized_network(network, config). So how do I solve the problem
Environment
TensorRT Version: 8.2.3 NVIDIA GPU: 3090ti NVIDIA Driver Version: CUDA Version: 11.2 CUDNN Version: 8.05 Operating System: ubuntu18.04 Python Version (if applicable): 3.8.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.8 Baremetal or Container (if so, version):
Looks like a usage issue, very likely somewhere in your code is wrong. can you provide a reproduce?
@zerollzeng ,thanks so much.
TRT_LOGGER = trt.Logger()
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
if os.path.exists(self.cfg.engine_file_path):
# If a serialized engine exists, use it instead of building an engine.
print("Reading engine from file {}".format(self.cfg.engine_file_path))
with open(self.cfg.engine_file_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
else:
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network(EXPLICIT_BATCH) as network, \
builder.create_builder_config() as config, \
trt.OnnxParser(network, TRT_LOGGER) as parser, \
trt.Runtime(TRT_LOGGER) as runtime:
config.max_workspace_size = 1 << 32
builder.max_batch_size = 1
# config.set_flag(trt.BuilderFlag.FP16)
if not os.path.exists(onnx_file):
print('ONNX file {} not found.'.format(onnx_file))
exit(0)
print('Loading ONNX file from path {} ...'.format(onnx_file))
with open(onnx_file, 'rb') as model:
print('Beginning ONNX file parsing')
if not parser.parse(model.read()):
print('ERROR: Failed to parse the onnx file.')
for e in range(parser.num_errors):
print(parser.get_error(e))
return None
print("Completed parsing ONNX file")
network.get_input(0).shape = [1, 3, 1152, 1152]
# 序列化模型
plan = builder.build_serialized_network(network, config)
# 反序列化
engine = runtime.deserialize_cuda_engine(plan)
print("Completed creating Engine")
with open(self.cfg.engine_file_path, 'wb') as f:
f.write(plan)
print('save trt success ...')
return engine
with open(self.cfg.engine_file_path, 'wb') as f:
f.write(plan)
# f.write(engine.serialize())
print('save trt success ...')
return engine
Both polygraphy and Trtexec report this wrong Segmentation fault
Do you mean you can reproduce it with trtexec? Can you share the onnx with us? there should be a bug in TRT we need to investigate, it should never seg fault.
You need to merge these three ONNX files , https://github.com/ywfwyht/onnx_model
Do you mean you can reproduce it with trtexec? Can you share the onnx with us? there should be a bug in TRT we need to investigate, it should never seg fault.
Trtexec must be added with --best. Will be successful . If I do not add --best, the segmentation fault will still be reported
You need to merge these three ONNX files , https://github.com/ywfwyht/onnx_model
Can you upload it to Google Drive, or use Git LFS to add those file to your repo?
You need to merge these three ONNX files , https://github.com/ywfwyht/onnx_model
Can you upload it to Google Drive, or use Git LFS to add those file to your repo?
Google Drive and Git LFS cannot be used
Okay, Can you tell me how to merge these sub-onnx model?
Okay, Can you tell me how to merge these sub-onnx model?
I uploaded the full ONNX from the new git LFS,https://github.com/ywfwyht/onnx_model/blob/main/0905_p28_t1_seg.onnx
I can reproduce this in 8.2.3. but the issue is fixed in TRT 8.4. I think the error comes from Myelin. cc @jackwish for viz.
[09/26/2022-01:52:03] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -8010679767156598961
[09/26/2022-01:52:03] [V] [TRT] =============== Computing costs for
[09/26/2022-01:52:03] [V] [TRT] *************** Autotuning format combination: Float(1327104,20736,144,1) -> Float(165888,20736,144,1) ***************
[09/26/2022-01:52:03] [V] [TRT] --------------- Timing Runner: {ForeignNode[753 + (Unnamed Layer* 164) [Shuffle]...Reshape_389]} (Myelin)
Segmentation fault (core dumped)
root@1ac28267c26e:# /usr/src/tensorrt/bin/trtexec --onnx=0905_p28_t1_seg.onnx --workspace=16000 --verbose
@ywfwyht the best way to get rid of this error is to upgrade to TRT 8.4 since we won't move the fix to TRT 8.2, can you try it on your side?
@zerollzeng ,Have you tested TRT 8.4 ?
Yes, Using the official container 22.07 with TRT 8.4.1 works for me.
[09/26/2022-01:41:58] [I] === Performance summary ===
[09/26/2022-01:41:58] [I] Throughput: 23.5252 qps
[09/26/2022-01:41:58] [I] Latency: min = 43.3171 ms, max = 45.5698 ms, mean = 43.8852 ms, median = 43.7786 ms, percentile(99%) = 45.5698 ms
[09/26/2022-01:41:58] [I] Enqueue Time: min = 41.7809 ms, max = 42.952 ms, mean = 42.2847 ms, median = 42.2079 ms, percentile(99%) = 42.952 ms
[09/26/2022-01:41:58] [I] H2D Latency: min = 1.43738 ms, max = 2.53459 ms, mean = 1.51407 ms, median = 1.45654 ms, percentile(99%) = 2.53459 ms
[09/26/2022-01:41:58] [I] GPU Compute Time: min = 41.8014 ms, max = 42.9887 ms, mean = 42.3121 ms, median = 42.2393 ms, percentile(99%) = 42.9887 ms
[09/26/2022-01:41:58] [I] D2H Latency: min = 0.0532227 ms, max = 0.0654297 ms, mean = 0.0590306 ms, median = 0.0585938 ms, percentile(99%) = 0.0654297 ms
[09/26/2022-01:41:58] [I] Total Host Walltime: 3.10306 s
[09/26/2022-01:41:58] [I] Total GPU Compute Time: 3.08878 s
[09/26/2022-01:41:58] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[09/26/2022-01:41:58] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[09/26/2022-01:41:58] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/26/2022-01:41:58] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --onnx=0905_p28_t1_seg.onnx
Yes, Using the official container 22.07 with TRT 8.4.1 works for me.
[09/26/2022-01:41:58] [I] === Performance summary === [09/26/2022-01:41:58] [I] Throughput: 23.5252 qps [09/26/2022-01:41:58] [I] Latency: min = 43.3171 ms, max = 45.5698 ms, mean = 43.8852 ms, median = 43.7786 ms, percentile(99%) = 45.5698 ms [09/26/2022-01:41:58] [I] Enqueue Time: min = 41.7809 ms, max = 42.952 ms, mean = 42.2847 ms, median = 42.2079 ms, percentile(99%) = 42.952 ms [09/26/2022-01:41:58] [I] H2D Latency: min = 1.43738 ms, max = 2.53459 ms, mean = 1.51407 ms, median = 1.45654 ms, percentile(99%) = 2.53459 ms [09/26/2022-01:41:58] [I] GPU Compute Time: min = 41.8014 ms, max = 42.9887 ms, mean = 42.3121 ms, median = 42.2393 ms, percentile(99%) = 42.9887 ms [09/26/2022-01:41:58] [I] D2H Latency: min = 0.0532227 ms, max = 0.0654297 ms, mean = 0.0590306 ms, median = 0.0585938 ms, percentile(99%) = 0.0654297 ms [09/26/2022-01:41:58] [I] Total Host Walltime: 3.10306 s [09/26/2022-01:41:58] [I] Total GPU Compute Time: 3.08878 s [09/26/2022-01:41:58] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized. [09/26/2022-01:41:58] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput. [09/26/2022-01:41:58] [I] Explanations of the performance metrics are printed in the verbose logs. [09/26/2022-01:41:58] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --onnx=0905_p28_t1_seg.onnx
ok, thanks