standalone mode bug for dla
Description
failed to serialize the network for NVDLA standalone mode, it gives the crash log
DLA Cores: 2
4: [standardEngineBuilder.cpp::engineValidationForSafeDLAMode::1519] Error Code 4: Internal Error (Safe DLA is enabled but not all layers are running on DLA.)
Segmentation fault (core dumped)
Environment
TensorRT Version 8.2: NVIDIA GPU drive orin: NVIDIA Driver Version: CUDA Version 11.4: CUDNN Version 8.3.2: Operating System ubuntu: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
#include <iostream>
#include <cassert>
#include <fstream>
#include "NvInfer.h"
using namespace nvinfer1;
class Logger : public ILogger {
void log(Severity severity, const char* msg) noexcept override {
// suppress info-level messages
if (severity <= Severity::kWARNING) std::cout << msg << std::endl;
}
} logger;
int main() {
IRuntime* rt = createInferRuntime(logger);
std::cout << "DLA Cores: " << rt->getNbDLACores() << std::endl;
IBuilder* builder = createInferBuilder(logger);
builder->setMaxBatchSize(1);
IBuilderConfig* config = builder->createBuilderConfig();
config->setFlag(BuilderFlag::kFP16);
config->setMaxWorkspaceSize(1024 * 1024 * 1024);
config->setDefaultDeviceType(DeviceType::kDLA);
config->setDLACore(0);
// config->setFlag(BuilderFlag::kGPU_FALLBACK);
config->setEngineCapability(EngineCapability::kDLA_STANDALONE);
INetworkDefinition* network = builder->createNetworkV2(0);
Dims32 dim32{4, {1, 32, 32, 32}};
ITensor *input = network->addInput("input", DataType::kHALF, dim32);
input->setAllowedFormats(TensorFormats(TensorFormat::kCHW16));
ILayer *relu = network->addActivation(*input, ActivationType::kRELU);
// ILayer *relu = network->addUnary(*input, UnaryOperation::kABS);
relu->setName("relu");
// relu->setOutputType(0, DataType::kHALF);
assert(config->canRunOnDLA(relu));
ITensor *output = relu->getOutput(0);
output->setName("output");
network->markOutput(*output);
output->setType(DataType::kHALF);
output->setAllowedFormats(TensorFormats(TensorFormat::kCHW16));
ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
IHostMemory *serialized = engine->serialize();
assert(serialized);
std::ofstream p("relu.trt");
p.write((const char *)serialized->data(), serialized->size());
// IHostMemory *plan = builder->buildSerializedNetwork(*network, *config);
return 0;
}
If you enable the verbose log
class Logger : public ILogger {
void log(Severity severity, const char* msg) noexcept override {
std::cout << msg << std::endl;
}
} logger;
You should be able to see log like
>>>>>>>>>>>>>>> Chose Runner Type: DLA Tactic: 0x0000000000000003
Adding reformat layer: Reformatted Input Tensor 0 to {ForeignNode[relu]} (input) from Half(4096,4096,1:8,128,4) to Half(32768,32768,1024,32,1)
Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[relu]} (output) from Half(2048,2048,1024:16,32,1) to Half(4096,4096,1:8,128,4)
Formats and tactics selection completed in 0.0722914 seconds.
After reformat layers: 3 layers
Pre-optimized block assignment.
Block size 1073741824
Total Activation Memory: 1073741824
Detected 2 NvMedia tensors.
Layer: Reformatting CopyNode for Input Tensor 0 to {ForeignNode[relu]} Host Persistent: 0 Device Persistent: 0 Scratch Memory: 0
Layer: {ForeignNode[relu]} Host Persistent: 848 Device Persistent: 0 Scratch Memory: 0
Layer: Reformatting CopyNode for Output Tensor 0 to {ForeignNode[relu]} Host Persistent: 0 Device Persistent: 0 Scratch Memory: 0
Which means TRT still insert 2 reformat layers to the engine and that's why you see the error.
to solve this, you need to enable all allowed formats for DLA FP16 for the input and the output. refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#restrictions-with-dla
modify your code with
...
input->setAllowedFormats(TensorFormats(1U << static_cast<int>(TensorFormat::kCHW16) | 1U << static_cast<int>(TensorFormat::kDLA_HWC4) | 1U << static_cast<int>(TensorFormat::kDLA_LINEAR)));
...
output->setAllowedFormats(TensorFormats(1U << static_cast<int>(TensorFormat::kCHW16) | 1U << static_cast<int>(TensorFormat::kDLA_LINEAR)));
...
@zerollzeng thx for your answer. it works for me!
closing since no activity for more than 14 days, please reopen if you still have question, thanks!