TensorRT
                                
                                
                                
                                    TensorRT copied to clipboard
                            
                            
                            
                        sampleNMT on DLA
Description
No layer in sampleNMT does not run on DLA. Is there any easy way to run these layers on DLA?
Environment
JetPack 5.0.1 on AGX Orin and Jetpack 4.6 on Xavier AGX
Steps To Reproduce
Samples here and the follow-up followed: https://github.com/NVIDIA/TensorRT/tree/release/8.0/samples/sampleNMT#prerequisites Engine logs are inserted below. Any layer can't run on DLA (I have several examples running on DLA for other networks.)
Engine LOGS [W] [TRT] Default DLA is enabled but layer Embedding matrix is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Gather in embedding is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer LSTM encoder is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Matrix in multiplicative attention is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Attention Keys MM in multiplicative attention is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Replicate input sequence lengths for decoder is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Reshape encoder states for decoder initialization 0 is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Replicate encoder states for decoder initialization 0 is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Reshape encoder states for decoder initialization 1 is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Replicate encoder states for decoder initialization 1 is not supported on DLA, falling back to GPU. [W] [TRT] setDefaultDeviceType was called but no layer in the network can run on DLA. [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [I] [TRT] --------------- Layers running on GPU: [I] [TRT] Embedding matrix, Gather in embedding, LSTM encoder, Replicate input sequence lengths for decoder, Matrix in multiplicative attention, Reshape encoder states for decoder initialization 0, Reshape encoder states for decoder initialization 1, Replicate encoder states for decoder initialization 1, Replicate encoder states for decoder initialization 0, Attention Keys MM in multiplicative attention, [W] [TRT] No implementation of layer Gather in embedding obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer LSTM encoder obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Replicate input sequence lengths for decoder obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Replicate encoder states for decoder initialization 1 obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Replicate encoder states for decoder initialization 0 obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead. [I] [TRT] Detected 6 inputs and 5 output network tensors. [W] [TRT] Default DLA is enabled but layer Embedding matrix is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Gather in embedding is not supported on DLA, falling back to GPU. [W] [TRT] Concatenate embedded input and attention: DLA only supports concatenation on the C dimension. [W] [TRT] DLA LAYER: Batch size (combined volume except for CHW dimensions) 128 for layer Concatenate embedded input and attention exceeds max batch size allowed of 32. [W] [TRT] Default DLA is enabled but layer Concatenate embedded input and attention is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Reshape input for LSTM decoder is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer LSTM decoder is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Reshape output from LSTM decoder is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Raw Alignment Scores MM (Queries x Keys) in multiplicative attention is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Context Ragged Softmax is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Context Matrix Multiply is not supported on DLA, falling back to GPU. [W] [TRT] Concatinate decoder output and context: DLA only supports concatenation on the C dimension. [W] [TRT] DLA LAYER: Batch size (combined volume except for CHW dimensions) 128 for layer Concatinate decoder output and context exceeds max batch size allowed of 32. [W] [TRT] Default DLA is enabled but layer Concatinate decoder output and context is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Attention Matrix is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Attention Matrix Multiply is not supported on DLA, falling back to GPU. [W] [TRT] DLA LAYER: Batch size (combined volume except for CHW dimensions) 128 for layer (Unnamed Layer* 12) [Activation] exceeds max batch size allowed of 32. [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 12) [Activation] is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Projection matrix is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Projection Matrix Multiply is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Replicate beam likelihoods is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Softmax in likelihood calculation is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer TopK 1st in likelihood calculation is not supported on DLA, falling back to GPU. [W] [TRT] DLA LAYER: Batch size (combined volume except for CHW dimensions) 128 for layer EltWise multiplication in likelihood calculation exceeds max batch size allowed of 32. [W] [TRT] Default DLA is enabled but layer EltWise multiplication in likelihood calculation is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Reshape combined likelihoods is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer TopK 2nd in likelihood calculation is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Reshape vocabulary indices is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Shuffle vocabulary indices is not supported on DLA, falling back to GPU. [W] [TRT] setDefaultDeviceType was called but no layer in the network can run on DLA. [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [I] [TRT] --------------- Layers running on GPU: [I] [TRT] Embedding matrix, Gather in embedding, Replicate beam likelihoods, (Unnamed Layer* 1) [Gather]_output copy, input_attention copy, Reshape input for LSTM decoder, LSTM decoder, Reshape output from LSTM decoder, Raw Alignment Scores MM (Queries x Keys) in multiplicative attention, Context Ragged Softmax, Context Matrix Multiply, (Unnamed Layer* 5) [Shuffle]_output copy, (Unnamed Layer* 8) [Matrix Multiply]_output copy, Attention Matrix, Attention Matrix Multiply, (Unnamed Layer* 12) [Activation], Projection matrix, Projection Matrix Multiply, Softmax in likelihood calculation + TopK 1st in likelihood calculation, EltWise multiplication in likelihood calculation, Reshape combined likelihoods, TopK 2nd in likelihood calculation, Reshape vocabulary indices, Shuffle vocabulary indices, [W] [TRT] No implementation of layer Gather in embedding obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Replicate beam likelihoods obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Context Ragged Softmax obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Softmax in likelihood calculation + TopK 1st in likelihood calculation obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer TopK 2nd in likelihood calculation obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Reshape vocabulary indices obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Shuffle vocabulary indices obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation obeys reformatting-free rules, at least 8 reformatting nodes are needed, now picking the fastest path instead. [I] [TRT] Detected 9 inputs and 6 output network tensors. [W] [TRT] Default DLA is enabled but layer Shuffle decoder states 0 is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Shuffle decoder states 1 is not supported on DLA, falling back to GPU. [W] [TRT] Default DLA is enabled but layer Shuffle attention is not supported on DLA, falling back to GPU. [W] [TRT] setDefaultDeviceType was called but no layer in the network can run on DLA. [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [I] [TRT] --------------- Layers running on GPU: [I] [TRT] Shuffle attention, Shuffle decoder states 1, Shuffle decoder states 0, [W] [TRT] No implementation of layer Shuffle attention obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Shuffle decoder states 1 obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [W] [TRT] No implementation of layer Shuffle decoder states 0 obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation. [I] [TRT] Detected 4 inputs and 3 output network tensors.
It's expected these layer can not run on DLA because DLA doesn't support them. you can find DLA support layer in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers
closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!