TensorRT
TensorRT copied to clipboard
How to make PTQ calibration for a Hybrid Quantization model (int8 & fp16)
Description
what is the right way to calibrate a hybrid quantization model ?
i built my tensorrt engine from ONNX model by the sub code, i selected the class Calibrator(trt.IInt8EntropyCalibrator2)
to set the config.int8_calibrator
My hybrid-quantized super-resolution model's inference results are biased towards magenta. I have performed clipping operations; what could be the possible reason for this? Is there an issue with my calibration code? Or could it be due to a poor distribution of the calibration dataset? i am sure that my infer program is absolute right.
def build_engine_onnx(model_file, engine_file_path, min_shape, opt_shape, max_shape, calibration_stream):
logger = trt.Logger(trt.Logger.INFO)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 2 << 30) # 1GB,即1024MB
config.set_flag(trt.BuilderFlag.FP16)
config.set_flag(trt.BuilderFlag.INT8)
# 启用强类型匹配
# config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
# print(dir(trt.BuilderFlag))
# Add calibrator
calibrator = Calibrator(calibration_stream, 'calibration.cache')
config.int8_calibrator = calibrator
with open(model_file, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
profile = builder.create_optimization_profile()
input_name = network.get_input(0).name
# 设置多种输入张量维度
# profile.set_shape(input_name, min_shape, opt_shape, max_shape)
# 固定输入张量维度
network.get_input(0).shape = fixed_shape # 直接采用固定shape输入进行
config.add_optimization_profile(profile)
print(f"Building TensorRT engine from file {model_file}...")
# engine = builder.build_engine(network, config)
plan = builder.build_serialized_network(network, config)
# if plan is None:
# raise RuntimeError("Failed to build the TensorRT engine!")
# engine = runtime.deserialize_cuda_engine(plan)
# print("Completed creating Engine")
with open(engine_file_path, "wb") as f:
f.write(bytearray(plan))
return plan
Environment
TensorRT Version: 10.0.1
NVIDIA GPU: RTX4090
NVIDIA Driver Version: 12.0
CUDA Version: 12.0
CUDNN Version: 8.2.0
Operating System: Operating System: Linux interactive11554 5.11.0-27-generic https://github.com/NVIDIA/TensorRT/issues/29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Python Version (if applicable): 3,8,19