How to use calibrated cache file to generate int8 engine in c++
I have successfully generated calibrated dataset.cache file of my dataset using polygraphy. I want to load the generated calibrated cache file and create int8 engine using c++.
Function I'm using to convert onnx model into tensorrt engine file
std::unique_ptr<nvinfer1::ICudaEngine> createCudaEngine(const std::string &onnxFileName, nvinfer1::ILogger &logger, int batchSize, ENGINE_TYPE& type=FP32)
{
std::unique_ptr<IBuilder> builder = createInferBuilder(logger);
std::unique_ptr<INetworkDefinition> network = builder->createNetworkV2(1U << (unsigned) NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
std::unique_ptr<nvonnxparser::IParser> parser = nvonnxparser::createParser(*network, logger);
if (!parser->parseFromFile(onnxFileName.c_str(), static_cast<int>(ILogger::Severity::kINFO)))
throw std::runtime_error("ERROR: could not parse ONNX model " + onnxFileName + " !");
std::unique_ptr<IOptimizationProfile> profile = builder->createOptimizationProfile();
profile->setDimensions("input", OptProfileSelector::kMIN, Dims2{batchSize, 3});
profile->setDimensions("input", OptProfileSelector::kMAX, Dims2{batchSize, 3});
profile->setDimensions("input", OptProfileSelector::kOPT, Dims2{batchSize, 3});
IBuilderConfig* config = builder->createBuilderConfig();
config->setMaxWorkspaceSize(64*1024*1024);
config->addOptimizationProfile(profile);
if (INT8)
{
// how to load and use calibrated cache file here
config->setFlag(BuilderFlag::kINT8);
}
return builder->buildEngineWithConfig(*network, *config);
}
@zerollzeng Please help
Hi, I think you have a few options:
-
Using polygraphy CLI: Example here illustrates that you could mention the calibration cache and rebuild the engine without needing to provide a calibration dataset this time. I'm referring to the section "[Optional] Rebuild the engine using the cache to skip calibration". Command:
polygraphy convert identity.onnx --int8 --calibration-cache identity_calib.cache -o identity.engine -
Using Polygraphy API: Example here shows how you could define a custom calibrator with cache path mentioned. Please note that the documentation says this is the path you could not only save to but also load from.
-
Using the C++ API: For this, I think you'll need to supply a custom calibrator to the builder config with its readCalibrationCache() method defined so that the builder knows where to look. An old example illustrates this, but using the Python API: https://github.com/NVIDIA/TensorRT/issues/945