TensorRT How to make PTQ calibration for a Hybrid Quantization model (int8 & fp16)

How to make PTQ calibration for a Hybrid Quantization model (int8 & fp16)

Open renshujiajia opened this issue 7 months ago • 3 comments

Description

what is the right way to calibrate a hybrid quantization model ？ i built my tensorrt engine from ONNX model by the sub code, i selected the class Calibrator(trt.IInt8EntropyCalibrator2) to set the config.int8_calibrator

My hybrid-quantized super-resolution model's inference results are biased towards magenta. I have performed clipping operations; what could be the possible reason for this? Is there an issue with my calibration code? Or could it be due to a poor distribution of the calibration dataset? i am sure that my infer program is absolute right.

def build_engine_onnx(model_file, engine_file_path, min_shape, opt_shape, max_shape, calibration_stream):
    logger = trt.Logger(trt.Logger.INFO)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)
    
    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 2 << 30)                             # 1GB，即1024MB
    config.set_flag(trt.BuilderFlag.FP16)
    config.set_flag(trt.BuilderFlag.INT8)
    
    
    # 启用强类型匹配
    # config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
    # print(dir(trt.BuilderFlag))
    
    # Add calibrator
    calibrator = Calibrator(calibration_stream, 'calibration.cache')
    config.int8_calibrator = calibrator

    with open(model_file, 'rb') as model:
        if not parser.parse(model.read()):
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            return None

    profile = builder.create_optimization_profile()
    input_name = network.get_input(0).name
    
    # 设置多种输入张量维度
    # profile.set_shape(input_name, min_shape, opt_shape, max_shape)
    
    # 固定输入张量维度
    network.get_input(0).shape = fixed_shape            # 直接采用固定shape输入进行
    config.add_optimization_profile(profile)

    print(f"Building TensorRT engine from file {model_file}...")
    # engine = builder.build_engine(network, config)
    plan = builder.build_serialized_network(network, config)
    # if plan is None:
    #     raise RuntimeError("Failed to build the TensorRT engine!")

    # engine = runtime.deserialize_cuda_engine(plan)
    # print("Completed creating Engine")
    with open(engine_file_path, "wb") as f:
        f.write(bytearray(plan))
    return plan

Environment

TensorRT Version: 10.0.1

NVIDIA GPU: RTX4090

NVIDIA Driver Version: 12.0

CUDA Version: 12.0

CUDNN Version: 8.2.0

Operating System: Operating System: Linux interactive11554 5.11.0-27-generic https://github.com/NVIDIA/TensorRT/issues/29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Python Version (if applicable): 3,8,19

Jul 03 '24 07:07 renshujiajia

TensorRT TensorRT copied to clipboard

How to make PTQ calibration for a Hybrid Quantization model (int8 & fp16)

Description

Environment

TensorRT
TensorRT copied to clipboard