TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Sample Mnist - can't run with DLA

Open MTammvee opened this issue 3 years ago • 1 comments

Description

Running the sample works with out utilizing DLA-s. Passing --useDLACore=0 does not work and will end with an error message:

NVMEDIA_DLA : 528, ERROR: load from memory failed. [E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.) [E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)

Environment

TensorRT Version: 5.1.5 NVIDIA GPU: NVIDIA Volta™-class integrated GPU CUDA Version: 10.1 CUDNN Version: 7.5.1 Operating System: Debian

Relevant Files

Samples are from NVIDIA-s official tarball (version 5.1) (https://developer.nvidia.com/nvidia-tensorrt-5x-download) with relevant data.

Steps To Reproduce

./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0

&&&& RUNNING TensorRT.sample_onnx_mnist # ./sample_onnx_mnist --datadir=/root/tensorrt_sample/mnist_data --int8 --useDLACore=0 [I] Building and running a GPU inference engine for Onnx MNIST

Input filename: /root/tensorrt_sample/mnist_data/mnist.onnx ONNX IR version: 0.0.3 Opset version: 1 Producer name: CNTK Producer version: 2.4 Domain:
Model version: 1 Doc string:

[I] [TRT] Parameter193:Constant -> (16, 4, 4, 10) [I] [TRT] Parameter193_reshape1:Reshape -> (256, 10) [I] [TRT] Parameter6:Constant -> (8) [I] [TRT] Parameter5:Constant -> (8, 1, 5, 5) [I] [TRT] Convolution28_Output_0:Conv -> (8, 28, 28) [I] [TRT] Plus30_Output_0:Add -> (8, 28, 28) [I] [TRT] ReLU32_Output_0:Relu -> (8, 28, 28) [I] [TRT] Pooling66_Output_0:MaxPool -> (8, 14, 14) [I] [TRT] Parameter87:Constant -> (16, 8, 5, 5) [I] [TRT] Convolution110_Output_0:Conv -> (16, 14, 14) [I] [TRT] Parameter88:Constant -> (16) [I] [TRT] Plus112_Output_0:Add -> (16, 14, 14) [I] [TRT] ReLU114_Output_0:Relu -> (16, 14, 14) [I] [TRT] Pooling160_Output_0:MaxPool -> (16, 4, 4) [I] [TRT] Pooling160_Output_0_reshape0:Reshape -> (256) [I] [TRT] Times212_Output_0:MatMul -> (10) [I] [TRT] Parameter194:Constant -> (1, 10) [I] [TRT] Plus214_Output_0:Add -> (10) ----- Parsing of ONNX model /root/tensorrt_sample/mnist_data/mnist.onnx is Done ---- [I] [TRT] Setting dynamic range for Input3 to [-127,127] [I] [TRT] Setting dynamic range for Convolution28_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for (Unnamed Layer* 1) [Constant]_output to [-127,127] [I] [TRT] Setting dynamic range for Plus30_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for ReLU32_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for Pooling66_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for Convolution110_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for (Unnamed Layer* 6) [Constant]_output to [-127,127] [I] [TRT] Setting dynamic range for Plus112_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for ReLU114_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for Pooling160_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for Pooling160_Output_0_reshape0 to [-127,127] [I] [TRT] Setting dynamic range for (Unnamed Layer* 11) [Constant]_output to [-127,127] [I] [TRT] Setting dynamic range for Times212_Output_0 to [-127,127] [I] [TRT] Setting dynamic range for (Unnamed Layer* 13) [Constant]_output to [-127,127] [I] [TRT] Setting dynamic range for Plus214_Output_0 to [-127,127] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 1) [Constant] is not running on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 6) [Constant] is not running on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 10) [Shuffle] is not running on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 11) [Constant] is not running on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 12) [Matrix Multiply] is not running on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 13) [Constant] is not running on DLA, falling back to GPU. [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] (Unnamed Layer* 0) [Convolution], (Unnamed Layer* 2) [ElementWise], (Unnamed Layer* 3) [Activation], (Unnamed Layer* 4) [Pooling], (Unnamed Layer* 5) [Convolution], (Unnamed Layer* 7) [ElementWise], (Unnamed Layer* 8) [Activation], (Unnamed Layer* 9) [Pooling], (Unnamed Layer* 14) [ElementWise], [I] [TRT] --------------- Layers running on GPU: [I] [TRT] (Unnamed Layer* 1) [Constant], (Unnamed Layer* 6) [Constant], (Unnamed Layer* 10) [Shuffle], (Unnamed Layer* 11) [Constant], (Unnamed Layer* 12) [Matrix Multiply], (Unnamed Layer* 13) [Constant], [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32. [I] [TRT] [INT8 Quantization] User overriding Scales: Input3 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Convolution28_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 1) [Constant]_output [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Plus30_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: ReLU32_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Pooling66_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Convolution110_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 6) [Constant]_output [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Plus112_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: ReLU114_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Pooling160_Output_0_reshape0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 11) [Constant]_output [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Times212_Output_0 [1] [I] [TRT] [INT8 Quantization] User overriding Scales: (Unnamed Layer* 13) [Constant]_output [1] [I] [TRT] [INT8 Quantization] User overriding Scales: Plus214_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Input3 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution28_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 1) [Constant]_output [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus30_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU32_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling66_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Convolution110_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 6) [Constant]_output [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus112_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: ReLU114_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Pooling160_Output_0_reshape0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 11) [Constant]_output [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Times212_Output_0 [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: (Unnamed Layer* 13) [Constant]_output [1] [I] [TRT] [INT8 Quantization] INT8 Inference Tensor Scales: Plus214_Output_0 [1] [I] [TRT] Original: 15 layers [I] [TRT] After dead-layer removal: 15 layers [I] [TRT] After DLA optimization: 13 layers [I] [TRT] After scale fusion: 13 layers [I] [TRT] After vertical fusions: 13 layers [I] [TRT] After swap: 13 layers [I] [TRT] After final dead-layer removal: 13 layers [I] [TRT] After tensor merging: 13 layers [I] [TRT] After concat removal: 13 layers [I] [TRT] Configuring builder for Int8 Mode completed in 0.0084503 seconds. [I] [TRT] Graph construction and optimization completed in 0.00888536 seconds. [W] [TRT] Warning: no implementation of (Unnamed Layer* 1) [Constant] obeys the requested constraints, using a higher precision type [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006912 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006976 [W] [TRT] Warning: no implementation of (Unnamed Layer* 6) [Constant] obeys the requested constraints, using a higher precision type [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00544 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00736 [W] [TRT] Warning: no implementation of (Unnamed Layer* 11) [Constant] obeys the requested constraints, using a higher precision type [W] [TRT] Warning: no implementation of (Unnamed Layer* 13) [Constant] obeys the requested constraints, using a higher precision type [I] [TRT] [I] [TRT] --------------- Timing Input3 to nvm(9) [I] [TRT] Tactic 0 time 0.006944 [I] [TRT] [I] [TRT] --------------- Timing {(Unnamed Layer* 0) [Convolution]}(31) [I] [TRT] Tactic 548859524883 is the only option, timing skipped [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.008832 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00736 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007168 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.011136 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00688 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007232 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.008832 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007136 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007232 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007136 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.005248 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00512 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.008832 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007136 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise [I] [TRT] Tactic 1 time 0.009536 [I] [TRT] Tactic 2 time 0.01232 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise [I] [TRT] Tactic 1 time 0.008896 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 2) ElementWise [I] [TRT] Tactic 1 time 0.009152 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007008 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.0104 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007936 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006976 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.008768 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006944 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.0112 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007008 [I] [TRT] [I] [TRT] --------------- Timing {(Unnamed Layer* 3) [Activation],(Unnamed Layer* 4) [Pooling],(Unnamed Layer* 5) [Convolution]}(31) [I] [TRT] Tactic 548859524883 is the only option, timing skipped [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.009344 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006944 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.0072 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00864 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.00688 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.005216 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.008608 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007136 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006944 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007488 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.005344 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.005408 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.009088 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.005248 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise [I] [TRT] Tactic 1 time 0.009216 [I] [TRT] Tactic 2 time 0.009184 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise [I] [TRT] Tactic 1 time 0.008608 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 7) ElementWise [I] [TRT] Tactic 1 time 0.00864 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006944 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.009344 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007264 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.007232 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.009216 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.005344 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.008896 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.006976 [I] [TRT] [I] [TRT] --------------- Timing {(Unnamed Layer* 8) [Activation],(Unnamed Layer* 9) [Pooling]}(31) [I] [TRT] Tactic 548859524883 is the only option, timing skipped [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.009152 [I] [TRT] [I] [TRT] --------------- Timing (9) [I] [TRT] Tactic 0 time 0.0088 [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 10) Shuffle [I] [TRT] Tactic 0 is the only option, timing skipped [W] [TRT] Warning: no implementation of (Unnamed Layer* 10) [Shuffle] obeys the requested constraints, using a higher precision type [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 12) Matrix Multiply [I] [TRT] Tactic 0 is the only option, timing skipped [W] [TRT] Warning: no implementation of (Unnamed Layer* 12) [Matrix Multiply] obeys the requested constraints, using a higher precision type [I] [TRT] [I] [TRT] --------------- Timing (Unnamed Layer* 14) ElementWise [I] [TRT] Tactic 1 is the only option, timing skipped [W] [TRT] Warning: no implementation of (Unnamed Layer* 14) [ElementWise] obeys the requested constraints, using a higher precision type [I] [TRT] Adding reformat layer: (Unnamed Layer* 1) [Constant] output to be reformatted 0 ((Unnamed Layer* 1) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,8) [I] [TRT] Adding reformat layer: (Unnamed Layer* 6) [Constant] output to be reformatted 0 ((Unnamed Layer* 6) [Constant]_output) from Int8(1,1,1:32,1) to Float(1,1,1,16) [I] [TRT] Adding reformat layer: (Unnamed Layer* 10) [Shuffle] reformatted input 0 (Pooling160_Output_0) from Int8(1,4,16:32,16) to Float(1,4,16,256) [I] [TRT] Formats and tactics selection completed in 3.93059 seconds. [I] [TRT] After reformat layers: 22 layers [I] [TRT] Block size 16777216 [I] [TRT] Block size 25088 [I] [TRT] Block size 1024 [I] [TRT] Block size 512 [I] [TRT] Total Activation Memory: 16803840 [I] [TRT] Detected 1 input and 1 output network tensors. NVMEDIA_DLA : 528, ERROR: load from memory failed. [E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.) [E] [TRT] dla/dlaUtils.cpp (171) - DLA Error in deserialize: 7 (Failure to load program.)

MTammvee avatar Aug 17 '22 12:08 MTammvee

5.1.5 is too old. and DLA is only available at Jetson Platform, like Jetson Xavier, NX, Nano, Orin...

zerollzeng avatar Aug 18 '22 12:08 zerollzeng

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

ttyio avatar Dec 06 '22 02:12 ttyio