TensorRT There are something wrong in WSL2 with tensorrt example

Description

I try to test tensorrt example sample_mnist in WSL2, and I get a wrong output. This is the error log: &&&& RUNNING TensorRT.sample_mnist [TensorRT v8203] # ./sample_mnist [01/30/2022-16:45:10] [I] Building and running a GPU inference engine for MNIST [01/30/2022-16:45:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +456, GPU +0, now: CPU 467, GPU 1249 (MiB) [01/30/2022-16:45:11] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 467 MiB, GPU 1249 MiB [01/30/2022-16:45:11] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 530 MiB, GPU 1249 MiB [01/30/2022-16:45:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +791, GPU +340, now: CPU 1323, GPU 1589 (MiB) [01/30/2022-16:45:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +196, GPU +342, now: CPU 1519, GPU 1931 (MiB) [01/30/2022-16:45:12] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [01/30/2022-16:45:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [01/30/2022-16:45:22] [I] [TRT] Detected 1 inputs and 1 output network tensors. [01/30/2022-16:45:22] [I] [TRT] Total Host Persistent Memory: 9536 [01/30/2022-16:45:22] [I] [TRT] Total Device Persistent Memory: 1626624 [01/30/2022-16:45:22] [I] [TRT] Total Scratch Memory: 0 [01/30/2022-16:45:22] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 13 MiB [01/30/2022-16:45:22] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.017826ms to assign 3 blocks to 8 nodes requiring 57857 bytes. [01/30/2022-16:45:22] [I] [TRT] Total Activation Memory: 57857 [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2498, GPU 2415 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2498, GPU 2425 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2499, GPU 2387 (MiB) [01/30/2022-16:45:22] [I] [TRT] Loaded engine size: 2 MiB [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2500, GPU 2397 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2500, GPU 2405 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2434, GPU 2397 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2434, GPU 2405 (MiB) [01/30/2022-16:45:22] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 0, GPU 3 (MiB) [01/30/2022-16:45:22] [I] Input: @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@= ++++#++=@@@@@ @@@@@@@@#. @@@@@ @@@@@@@@= @@@@@ @@@@@@@@. .. ...%@@@@@ @@@@@@@@: .%@@#@@@@@@@@@@@@@ @@@@@@@% -@@@@@@@@@@@@@@@@@ @@@@@@@% -@@@@@@@@@@@@@@@ @@@@@@@# :#- ::. ::=@@@@@@@ @@@@@@@- -@@@@@@ @@@@@@%. @@@@@ @@@@@@# :==+== @@@@@ @@@@@@%---%%@@@@@@@. @@@@@ @@@@@@@@@@@@@@@@@@@+ @@@@@ @@@@@@@@@@@@@@@@@@@= @@@@@ @@@@@@@@@@@@@@@@@@ @@@@@ @@@@@%+%@@@@@@@@%. .%@@@@@ @@@@@ .= -@@@@@@@ @@@@@ .#@@@@@@@ @@@@@ =%@@@@@@@@ @@@@@@%#+++= =@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@

[01/30/2022-16:45:22] [I] Output: 0: 1: 2: ** 3: ** 4: 5: 6: 7: 8: * 9: *****

&&&& FAILED TensorRT.sample_mnist [TensorRT v8203] # ./sample_mnist

Environment

TensorRT Version: 8.2.3 NVIDIA GPU: laptop3070 NVIDIA Driver Version: 510.06 CUDA Version: 11.43 CUDNN Version: 8.2.4 Operating System: WSL2 Ubuntu20.04 Python Version (if applicable): python3.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.10.2 Baremetal or Container (if so, version):

Relevant Files

https://github.com/NVIDIA/TensorRT/blob/main/samples/sampleMNIST/README.md

Steps To Reproduce

if use environment win11+wsl2+ubuntu20.04, you will get the same output too

Jan 30 '22 08:01 triple-Mu

same problem

May 09 '22 10:05 lzwhard

Linking similar issue #1771

May 26 '22 06:05 galagam

My environment is: TensorRT Version: 8.2.2.1 NVIDIA GPU: T1200 laptop NVIDIA Driver Version: 516.25 CUDA Version: 11.4 CUDNN Version: 8.4.1 Operating System: win10 wsl2 ubuntu18.04LTS kernel: 5.10.102.1-microsoft-standard-WSL2

I use tar to install CUDA, Cudnn, and TensorRT

And has the same iussue:

&&&& RUNNING TensorRT.sample_mnist [TensorRT v8202] # ./sample_mnist [06/20/2022-11:53:01] [I] Building and running a GPU inference engine for MNIST [06/20/2022-11:53:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 331, GPU 905 (MiB) [06/20/2022-11:53:02] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 331 MiB, GPU 905 MiB [06/20/2022-11:53:02] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 394 MiB, GPU 905 MiB [06/20/2022-11:53:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2 [06/20/2022-11:53:03] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +484, GPU +206, now: CPU 881, GPU 1111 (MiB) [06/20/2022-11:53:03] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +117, GPU +52, now: CPU 998, GPU 1163 (MiB) [06/20/2022-11:53:03] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [06/20/2022-11:53:06] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [06/20/2022-11:53:07] [I] [TRT] Detected 1 inputs and 1 output network tensors. [06/20/2022-11:53:07] [I] [TRT] Total Host Persistent Memory: 8992 [06/20/2022-11:53:07] [I] [TRT] Total Device Persistent Memory: 1626624 [06/20/2022-11:53:07] [I] [TRT] Total Scratch Memory: 0 [06/20/2022-11:53:07] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 13 MiB [06/20/2022-11:53:07] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.023214ms to assign 3 blocks to 10 nodes requiring 57857 bytes. [06/20/2022-11:53:07] [I] [TRT] Total Activation Memory: 57857 [06/20/2022-11:53:07] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2 [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1505, GPU 1389 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1506, GPU 1397 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1507, GPU 1359 (MiB) [06/20/2022-11:53:07] [I] [TRT] Loaded engine size: 1 MiB [06/20/2022-11:53:07] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2 [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1507, GPU 1369 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1507, GPU 1377 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB) [06/20/2022-11:53:07] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2 [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1441, GPU 1369 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1441, GPU 1377 (MiB) [06/20/2022-11:53:07] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 0, GPU 3 (MiB) [06/20/2022-11:53:07] [I] Input: @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@+ @@@@@@@@@@@@@@ @@@@@@@@@@@@. @@@@@@@@@@@@@@ @@@@@@@@@@@@- @@@@@@@@@@@@@@ @@@@@@@@@@@# @@@@@@@@@@@@@@ @@@@@@@@@@@# @@@@@@@@@@@@@ @@@@@@@@@@@@ :@@@@@@@@@@@@@ @@@@@@@@@@@@= .@@@@@@@@@@@@@ @@@@@@@@@@@@# %@@@@@@@@@@@@ @@@@@@@@@@@@% .@@@@@@@@@@@@@ @@@@@@@@@@@@% %@@@@@@@@@@@@ @@@@@@@@@@@@% %@@@@@@@@@@@@ @@@@@@@@@@@@@= +@@@@@@@@@@@@ @@@@@@@@@@@@@ -@@@@@@@@@@@@ @@@@@@@@@@@@@* @@@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@@@@@ @@@@@@@@@@@@@@ *@@@@@@@@@@@ @@@@@@@@@@@@@@ *@@@@@@@@@@@ @@@@@@@@@@@@@@ *@@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@@@@ @@@@@@@@@@@@@@ @@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@

[06/20/2022-11:53:07] [I] Output: 0: 1: 2: 3: 4: 5: ********** 6: 7: 8: 9:

&&&& FAILED TensorRT.sample_mnist [TensorRT v8202] # ./sample_mnist

Jun 20 '22 04:06 wuwenxuan0226

[06/20/2022-11:53:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2

Could you upgrade the CUBLAS to 11.6.3 and see if the issue is fixed? If not, please enable verbose logging and share the log. Thanks

Jun 24 '22 06:06 nvpohanh

[06/20/2022-11:53:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.5.2

Could you upgrade the CUBLAS to 11.6.3 and see if the issue is fixed? If not, please enable verbose logging and share the log. Thanks

I have tried cuda toolkits 11.7 with cudnn 8.4.1 so the cublas is the latest version. The problem is still not fixed. So do you mean I should try the specific CUBLAS 11.6.3? And sorry I can't enable verbose logging because my environment in on my company PC. Otherwise I will try it in WIN11 for the new feature.

Jun 24 '22 08:06 wuwenxuan0226

@wuwenxuan0226 Could you try TRT 8.4.1.5 + CUDNN 8.4.1 + CUDA 11.6?

Jul 01 '22 06:07 nvpohanh

I reinstall WSL and upgrade WSL to 2. Then I reinstall TRT 8.4.1.5 + CUDNN 8.4.1 + CUDA 11.6 via .run file and .tar bag. And still have error.

These are my informations:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

(wenxuan) wenxuan@DL7J0L4M3:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

\file: The master cuDNN version file. */

#ifndef CUDNN_VERSION_H_ #define CUDNN_VERSION_H_

#define CUDNN_MAJOR 8 #define CUDNN_MINOR 4 #define CUDNN_PATCHLEVEL 1

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

#ifndef NV_INFER_VERSION_H #define NV_INFER_VERSION_H

#define NV_TENSORRT_MAJOR 8 //!< TensorRT major version. #define NV_TENSORRT_MINOR 4 //!< TensorRT minor version. #define NV_TENSORRT_PATCH 1 //!< TensorRT patch version. #define NV_TENSORRT_BUILD 5 //!< TensorRT build number.

#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version. #define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version. #define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.

#define NV_TENSORRT_SONAME_MAJOR 8 //!< Shared object library major version number. #define NV_TENSORRT_SONAME_MINOR 4 //!< Shared object library minor version number. #define NV_TENSORRT_SONAME_PATCH 1 //!< Shared object library patch version number.

#endif // NV_INFER_VERSION_H

Jul 02 '22 05:07 wuwenxuan0226

@wuwenxuan0226 Could you try TRT 8.4.1.5 + CUDNN 8.4.1 + CUDA 11.6?

I also tried it via .deb install. I passed CUDA ./deviceQuery and Cudnn ./mnistCUDNN test. But cannot pass Tensorrt ./sample_mnist.

Jul 02 '22 05:07 wuwenxuan0226

The bug is not fixed

Jul 02 '22 13:07 triple-Mu

I have the same problem. The output is： 0: * 1: * 2: * 3: * 4: * 5: * 6: * 7: * 8: * 9: *

&&&& FAILED TensorRT.sample_mnist

gpu： RTX3060 laptop cuda toolkit: 11.0 11.6 11.7 cudnn: the newest version 8.4.1 tensorrt: the new version 8.4.1 os: wsl2 on windows11, ubuntu2004 ubuntu1804

The above combination of environments all test failed.

Jul 20 '22 08:07 ZhengXB210

I have the same problem. The output is： 0: * 1: * 2: * 3: * 4: * 5: * 6: * 7: * 8: * 9: *

&&&& FAILED TensorRT.sample_mnist

gpu： RTX3060 laptop cuda toolkit: 11.0 11.6 11.7 cudnn: the newest version 8.4.1 tensorrt: the new version 8.4.1 os: wsl2 on windows11, ubuntu2004 ubuntu1804

The above combination of environments all test failed.

I have the same issue. probability is 0.1 for all classes regardless of the input.

Sep 06 '22 15:09 ChristianOrr

I have the same problem. The output is：

[10/25/2022-22:40:42] [I] Output:
0: *
1: *
2: *
3: *
4: *
5: *
6: *
7: *
8: *
9: *

&&&& FAILED TensorRT.sample_mnist [TensorRT v8403] # ./sample_mnist --datadir ../data/mnist --fp16

Oct 25 '22 14:10 QCHighing

This issue is still there. Not sure what's causing it. I am using WSL Ubuntu 20.04

CUDA 11.6
cuDNN cudnn-11.6-linux-x64-v8.4.1.50
TensorRT-8.4.3.1

Driver: 526.47
GPU: 2080ti

I have a script that runs YOLOv5 straight from Torch, then exports to ONNX and runs the ONNX model with ONNX Runtime, then converts the ONNX model to TRT Engine and runs it with TRT.

Both Torch and ONNX return the same results. TRT returns garbage data of the correct dimensions. I tried running the code on a Linux machine using the exact same setup and I get identical results with all 3 runtimes.

It seems that TRT runs fine, but produces garbage when running under WSL.

I see the same problem being reported on the nvidia forums [link]

Could you share a nvbug or another means through which people interested in the progress on this issue can monitor what's going on? I see people have reported this over the past year. Is there a pending solution? When can we expect a fix?

Nov 28 '22 18:11 alex-soklev

This issue is still there. Not sure what's causing it. I am using WSL Ubuntu 20.04
CUDA 11.6
cuDNN cudnn-11.6-linux-x64-v8.4.1.50
TensorRT-8.4.3.1

Driver: 526.47
GPU: 2080ti
I have a script that runs YOLOv5 straight from Torch, then exports to ONNX and runs the ONNX model with ONNX Runtime, then converts the ONNX model to TRT Engine and runs it with TRT.

Both Torch and ONNX return the same results. TRT returns garbage data of the correct dimensions. I tried running the code on a Linux machine using the exact same setup and I get identical results with all 3 runtimes.

It seems that TRT runs fine, but produces garbage when running under WSL.

I see the same problem being reported on the nvidia forums [link]

Could you share a nvbug or another means through which people interested in the progress on this issue can monitor what's going on? I see people have reported this over the past year. Is there a pending solution? When can we expect a fix?

Just tested that it works with latest TRT 8.5.1.7

Nov 29 '22 10:11 alex-soklev

close this due to being inactive for more than 3 weeks, feel free to reopen it if you have any further questions. Thanks!

Feb 15 '23 03:02 zerollzeng

Facing somewhat same issue with conversion of yolov7 from https://github.com/WongKinYiu/yolov7/tree/main/deploy/triton-inference-server

using image nvcr.io/nvidia/tensorrt:22.06-py3 from NGC

it runs for about 45 mins and the whole time SSD's utilization is 100% then it simply says killed with no engine file being created.

core i7 8750H RTX 2070 laptop windows 11(wsl 2) cuda 12.1 (531.41 driver version)

May 27 '23 07:05 danzz006

TensorRT TensorRT copied to clipboard

There are something wrong in WSL2 with tensorrt example

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard