TensorRT have problem for converting ONNX model to TRT

Description

Hi, I wanna convert my ONNX model to TRT. I use the below command to convert this model: /usr/src/tensorrt/bin/trtexec --onnx=model_folded.onnx --verbose --explicitBatch --saveEngine=model.trt After I searched for this problem, a collaborator from Nvidia said that it may solve by the below command. In addition he says that maybe folding it may solve this problem. polygraphy surgeon sanitize model.onnx --fold-constants --output model_folded.onnx After that for both model I have issue and can't convert to TRT.

The error is: [08/09/2022-07:55:01] [V] [TRT] --------------- Timing Runner: {ForeignNode[onnx::Add_746 + (Unnamed Layer* 20) [Shuffle]...Transpose_245 + (Unnamed Layer* 459) [Shuffle]]} (Myelin) [08/09/2022-07:55:02] [E] Error[1]: [graphContext.h::~MyelinGraphContext::35] Error Code 1: Myelin (no further information) [08/09/2022-07:55:02] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: cuBLAS initialization failed: 3. [08/09/2022-07:55:02] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf [08/09/2022-07:55:02] [V] [TRT] Deleting timing cache: 164 entries, served 8188 hits since creation. [08/09/2022-07:55:02] [E] Error[10]: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::Add_746 + (Unnamed Layer* 20) [Shuffle]...Transpose_245 + (Unnamed Layer* 459) [Shuffle]]}.) [08/09/2022-07:55:02] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) [08/09/2022-07:55:02] [E] Engine could not be created from network [08/09/2022-07:55:02] [E] Building engine failed [08/09/2022-07:55:02] [E] Failed to create engine from model or file. [08/09/2022-07:55:02] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # /usr/src/tensorrt/bin/trtexec --onnx=model_folded.onnx --verbose --explicitBatch --saveEngine=model.trt `

Environment

TensorRT Version: 8.4.2.4-1+cuda11.6 NVIDIA GPU: Tesla-P100 NVIDIA Driver Version: 440.33.01 CUDA Version: 11.3 CUDNN Version: 8.4.0.27 Operating System: Ubuntu 20.04.2 LTS Python Version (if applicable): 3.8.10 Tensorflow Version (if applicable): 2.9.1 PyTorch Version (if applicable): 1.9.0a0+c3d40fd

Aug 09 '22 08:08 alexandercesarr

Looks like a bug, can you share the onnx model here?

Aug 09 '22 08:08 zerollzeng

@zerollzeng Unfortunately no. It's 485Mb and I can't upload it.

Aug 09 '22 09:08 alexandercesarr

Looks like a CUBLAS_STATUS_ALLOC_FAILED:

cuBLAS initialization failed: 3

Maybe you're running out of memory on your GPU? Do other networks work on this GPU? I'm wondering if it could be a driver/setup issue.

Aug 09 '22 18:08 pranavm-nvidia

@pranavm-nvidia No. When I run the trtexec, I checked the memory of my GPU each second. It didn't use all of the memory. Also I checked other things like RAM. But all was ok.

Aug 10 '22 03:08 alexandercesarr

@zerollzeng Unfortunately no. It's 485Mb and I can't upload it.

You can upload it to Google Drive and share the link here.

Aug 10 '22 14:08 zerollzeng

@zerollzeng Yeah I know that but I don't have permission. It's commercial.

Aug 13 '22 11:08 alexandercesarr

I would suspect this is a Myelin bug, @jackwish for viz.

Aug 15 '22 02:08 zerollzeng

As @pranavm-nvidia mentioned above cuBLAS initialization failed: 3 is likely to be setup issue.

CUDA 11.x requires CUDA driver >= 450.80.02* according to the CUDA compatibility doc while your setup is CUDA 11.3 + driver 440.33.01.

@alexandercesarr Could you please upgrade your driver to an appropriate version (suggest to use the one bunddled in the CUDA toolkit package)? To isolate similar issues, we suggest to starting with the same CUDA version of the TensorRT builds, i.e. CUDA 11.6 in your case.

Aug 15 '22 07:08 zhenhuaw-me

Hi, sorry for my late reply. That was it. I upgraded my driver and it ran without any problem. Thanks @zerollzeng & @jackwish

Sep 04 '22 10:09 alexandercesarr

Glad to hear that! Now close this issue. Please let us know if any further issues.

Sep 04 '22 10:09 zhenhuaw-me

TensorRT TensorRT copied to clipboard

have problem for converting ONNX model to TRT

Description

Environment

TensorRT
TensorRT copied to clipboard