Zero Zeng comments

Results 582 comments of


                                            Zero Zeng

addFullyConnected will be replaced by addMatrixMultiply + addElementWise?

It should still work in 8.4 but will be deprecated in the future. more info: https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-0-EA

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

After exporting to onnx, can you run the model with trtexec? I would suspect the torch and TRT may use different cuda libraries.

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

Can you share the onnx model here?

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

Looks like there are similiar issue, https://github.com/NVIDIA/TensorRT/issues/1818 and https://github.com/NVIDIA/TensorRT/issues/2123 Can you check your cublaslt version in the log?

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

also https://github.com/NVIDIA/TensorRT/issues/866

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

Can you try remove ``` import torch import torchvision.models as models ``` and all the torch stuff in your script? only leave the trt part. like ``` import os import...

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

I didn't reproduce this in my environment with cuda 11.6, also seems you miss import pycuda.autoinit. can you try to upgrade the cuda 11? ``` import pycuda.driver as cuda import...

[TRT] [E] 2: [ltWrapper.cpp::setupHeuristic::349] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

my code ``` import os import numpy as np import pycuda.driver as cuda import pycuda.autoinit import tensorrt as trt import time # build engine with trtexec BATCH_SIZE=32 target_dtype = np.float16...

Do I have to do PTQ before QAT with pytorch_quantization toolkit?

@ttyio ^ ^

How to get the profile by python?

I haven't used it before but I guess https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/ExecutionContext.html#tensorrt.IExecutionContext.report_to_profiler is the answer, @nvpohanh may know more about it.