TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Segmentation fault of TensorRT 8.6 when running `trtexec --onnx=<file>` on GPU V100

Open lhai37 opened this issue 1 year ago • 2 comments

Description

I tried to run the attached model using trtexec tool on the V100 GPU with TensorRT 8.6 on CUDA 12.1, but it fails with a Segmentation fault (core dumped) error below. The same model can be loaded fine with TensorRT 8.4, CUDA 11.6, GTX 1080. Note: possibly related to https://github.com/NVIDIA/TensorRT/issues/3631, this is the same model but with dynamic batch size.

./trtexec --onnx=trtexec_segfault.onnx --verbose
...omitted output, see attached log...
Segmentation fault (core dumped)

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: Tesla V100

NVIDIA Driver Version: 545.23.08

CUDA Version: 12.1

CUDNN Version: 8.9.0.131-1+cuda12.1

Operating System: Ubuntu 20.04

Python Version (if applicable): N/A

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): N/A

Baremetal or Container (if so, version): N/A

Relevant Files

Model link: https://drive.google.com/file/d/10old1P-M5gafvWjjLVI3khkiGnlVVB9L/view?usp=sharing

Output log: trtexec_segfault.txt

Steps To Reproduce

Commands or scripts: ./trtexec --onnx=trtexec_segfault.onnx --verbose

Have you tried the latest release?: Yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): This model can be run with TensorRT 8.4, CUDA 11.6, GTX 1080

lhai37 avatar Jan 24 '24 19:01 lhai37

Could you please try TRT 9.2? Looks like a fixed issue Link: https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-11.8.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-12.2.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.ubuntu-22.04.aarch64-gnu.cuda-12.2.tar.gz

[I] Finished engine building in 42.643 seconds
[I] trt-runner-N0-01/27/24-07:47:50
    ---- Inference Input(s) ----
    {img [dtype=float32, shape=(1, 3, 720, 1280)],
     seg [dtype=float32, shape=(1, 1, 720, 1280)]}
[I] trt-runner-N0-01/27/24-07:47:50
    ---- Inference Output(s) ----
    {mask [dtype=float32, shape=(1, 1, 720, 1280)]}
[I] trt-runner-N0-01/27/24-07:47:50     | Completed 1 iteration(s) in 11 ms | Average inference time: 11 ms.
[I] PASSED | Runtime: 46.264s | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run trtcppapi_segfault.onnx --trt

zerollzeng avatar Jan 27 '24 07:01 zerollzeng

the above test was for #3631

zerollzeng avatar Jan 27 '24 08:01 zerollzeng

closing since no activity for more than 3 weeks, thanks all!

ttyio avatar Mar 05 '24 17:03 ttyio