TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Unable to convert an onnx model to tensorrt using int8 with calibration set

Open ninono12345 opened this issue 1 year ago • 6 comments

Description

Hi, I was going to use polygraphys converter to tensorrt and calibrator, but this model uses InstanceNormalization and the onnx parser flag has to be set: parser.set_flag(trt.OnnxParserFlag.NATIVE_INSTANCENORM), so I am trying to do everything with tensorrt code. This is my code:

` import pycuda.driver as cuda import pycuda.autoinit

import numpy as np
import onnx
import tensorrt as trt
import torch

calset = torch.load("cs.pt")

def calibration_data_stream():
    for i in range(len(calset)):
        im_patches = calset[i][0].cpu().numpy()
        train_feat = calset[i][1].cpu().numpy()
        target_labels = calset[i][2].cpu().numpy()
        train_ltrb = calset[i][3].cpu().numpy()
        yield [im_patches, train_feat, target_labels, train_ltrb]

class EntropyCalibrator2(trt.IInt8EntropyCalibrator2):
    def __init__(self, calibration_stream, cache_file):
        # input_layers: a list of dictionaries containing names and shapes of the input layers
        # cache_file: path to save calibration cache
        super(EntropyCalibrator2, self).__init__()
        self.calibration_stream = calibration_stream
        self.cache_file = cache_file
        self.batch_size = 1
        self.current_index = 0
        self.device_input_buffers = []  # To hold device input buffers
        self.allocate_buffers()

    def allocate_buffers(self):
        for tensors in next(iter(self.calibration_stream)):
            for tensor in tensors:
                volume = trt.volume(tensor.shape)
                print("allocate_buffers")
                print(volume)
                print(tensor.nbytes)
                # dtype = np.float32
                self.device_input_buffers.append(cuda.mem_alloc(tensor.nbytes))

    def get_batch_size(self):
        return 1
    
    def get_batch(self, names):
        try:
            for name in names:
                print(name)
            data = next(self.calibration_stream)
            for input_tensor, b in zip(data, self.device_input_buffers):
                # if name not in self.device_input_buffers:
                    # raise ValueError(f"Buffer for {name} not allocated")
                    
                if not isinstance(input_tensor, np.ndarray) or input_tensor.dtype != np.float32:
                    raise TypeError("Input tensor must be a np.ndarray with dtype np.float32")
                
                # if np.prod(input_tensor.shape) * input_tensor.dtype.itemsize != b.size:
                    # raise ValueError("Input tensor size does not match the allocated buffer size")
                
                cuda.memcpy_htod(b, np.ascontiguousarray(input_tensor))
                print("get batch")
                print(type(b))
                print(b)
                print(int(b))
            return [int(b) for b in self.device_input_buffers]
        except StopIteration:
            return []
    
    def read_calibration_cache(self):
        try:
            with open(self.cache_file, "rb") as f:
                return f.read()
        except:
            return None

    def write_calibration_cache(self, cache):
        with open(self.cache_file, "wb") as f:
            f.write(cache)

calibration_data_stream_gen = calibration_data_stream()
calibrator = EntropyCalibrator2(calibration_data_stream_gen, "calibration_cache.bin")

input_layers = [
    {'name': 'im_patches', 'shape': (1, 3, 288, 288)},
    {'name': 'train_feat', 'shape': (1, 256, 18, 18)},
    {'name': 'target_labels', 'shape': (1, 1, 18, 18)},
    {'name': 'train_ltrb', 'shape': (1, 4, 18, 18)}
]

# Constants
ONNX_MODEL_PATH = 'new_full_explicit_batch32.onnx'
TENSORRT_ENGINE_PATH = 'new_full_explicit_batch32.engine'
# ONNX_MODEL_PATH = 'new_full_implicit_batch_16.onnx'
# TENSORRT_ENGINE_PATH = 'new_full_implicit_batch_16.engine'
MIN_BATCH_SIZE = 1
MAX_BATCH_SIZE = 32

# Set up the logger
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

# Create a TensorRT builder, runtime, and network
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.set_flag(trt.OnnxParserFlag.NATIVE_INSTANCENORM)

# Parse the ONNX model file
with open(ONNX_MODEL_PATH, 'rb') as model:
    if not parser.parse(model.read()):
        print('ERROR: Failed to parse the ONNX file.')
        for error in range(parser.num_errors):
            print(parser.get_error(error))
        exit(1)

# Define optimization profile for dynamic batch size
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = calibrator
profile = builder.create_optimization_profile()
profile.set_shape('im_patches', (MIN_BATCH_SIZE, 3, 288, 288), (MAX_BATCH_SIZE, 3, 288, 288), (MAX_BATCH_SIZE, 3, 288, 288))
profile.set_shape('train_feat', (MIN_BATCH_SIZE, 256, 18, 18), (MAX_BATCH_SIZE, 256, 18, 18), (MAX_BATCH_SIZE, 256, 18, 18))
profile.set_shape('target_labels', (1, MIN_BATCH_SIZE, 18, 18), (1, MAX_BATCH_SIZE, 18, 18), (1, MAX_BATCH_SIZE, 18, 18))
profile.set_shape('train_ltrb', (MIN_BATCH_SIZE, 4, 18, 18), (MAX_BATCH_SIZE, 4, 18, 18), (MAX_BATCH_SIZE, 4, 18, 18))
config.add_optimization_profile(profile)

# Build the engine
engine = builder.build_serialized_network(network, config)

# Save the engine
with open(TENSORRT_ENGINE_PATH, 'wb') as f:
    f.write(engine)

`

and I am getting this error:

[02/08/2024-03:36:43] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [02/08/2024-03:36:43] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped [02/08/2024-03:36:43] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond. [02/08/2024-03:36:43] [TRT] [W] Calibration Profile is not defined. Calibrating with Profile 0 [02/08/2024-03:37:01] [TRT] [E] 1: [genericReformat.cu::genericReformat::executeMemcpy::1583] Error Code 1: Cuda Runtime (invalid argument) [02/08/2024-03:37:02] [TRT] [E] 3: [engine.cpp::nvinfer1::rt::Engine::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: engine.cpp::nvinfer1::rt::Engine::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior. ) [02/08/2024-03:37:02] [TRT] [E] 2: [calibrator.cpp::nvinfer1::builder::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. ) Traceback (most recent call last): File "D:\pyth\pytracking-master2\pytracking\band13.py", line 131, in f.write(engine) TypeError: a bytes-like object is required, not 'NoneType'

the first warning that I don't like is that Calibration profile is not defined, after which folows executeMemcpy, which I assume is with cuda.memcpy_htod? correct me if I'm wrong...

My journey with tensorrt is a very difficult one, because there is no explanation anywhere, the documentation is very vague docs and I seemed to follow the examples, but sadly I cannot get this right...

onnx model link: https://drive.google.com/file/d/1ajZQShdSqj1IEHNQFa5Z0I5keiBHagsK/view?usp=sharing

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: GTX 1660 Ti

NVIDIA Driver Version: 546.01

CUDA Version: 12.1

CUDNN Version: 8.9.7

Operating System: Windows 10

Python Version (if applicable): 3.10.13

PyTorch Version (if applicable): 2.1.2+cu121

ninono12345 avatar Feb 08 '24 00:02 ninono12345

Could you please try TRT 9.2/9.3? I check our latest internal release and found calibration is success with polygraphy convert new_full_explicit_batch32.onnx --int8 -o out.plan, so this seems to be a fixed issue.

zerollzeng avatar Feb 12 '24 17:02 zerollzeng

@zerollzeng thank you for your answer, I just wanted to write that I've done a mistake in my code, now it successfully converted in tensorrt 8.6 just changed the dataset and now everything is ok

ninono12345 avatar Feb 12 '24 17:02 ninono12345

@zerollzeng Now I'm getting this, a lot of layers are not converted to int8, how can I fix that?:

[02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 0) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 153) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 154) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 156) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 157) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 160) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 161) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /Transpose_4_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 179) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 181) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 182) [Convolution]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 185) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 186) [Convolution]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 189) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 190) [Convolution]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 195) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 198) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 200) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 201) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 203) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 204) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 206) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 207) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 209) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 210) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 212) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 213) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 215) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 216) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 223) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 224) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 228) [Softmax]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /transformer/encoder/layers.0/self_attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 233) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 234) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 235) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 236) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 240) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 241) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 242) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 243) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 245) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 246) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 248) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 249) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 252) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 253) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 255) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 256) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 259) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 260) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 261) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 262) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 265) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 266) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 268) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 269) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 271) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 272) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 274) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 275) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 277) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 278) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 280) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 281) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 288) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 289) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 293) [Softmax]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /transformer/encoder/layers.1/self_attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 298) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 299) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 300) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 301) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 305) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 306) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 307) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 308) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 310) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 311) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 313) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 314) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 317) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 318) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 320) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 321) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 324) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 325) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 326) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 330) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 333) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 334) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 336) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 337) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 339) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 340) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 342) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 343) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 345) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 346) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 353) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 354) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 358) [Softmax]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /transformer/encoder/layers.2/self_attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 363) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 364) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 366) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 371) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 372) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 373) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 375) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 376) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 378) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 379) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 389) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 390) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 391) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 392) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 395) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 396) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 398) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 399) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 401) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 402) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 404) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 405) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 407) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing

ninono12345 avatar Feb 12 '24 18:02 ninono12345

@ttyio, I noticed another post about this issue, and I think you can help.

there are about 3 times more layers that were unable to convert to int8. Can you suggest how should I approach this? Can I write plugins so these layers are implemented? Looking at this I think that there is only a few unsupported layers that are used a lot of times... Can you suggest me how should I fix this, because out of 1500 more than 500 layers don't have int8 implementation. If I must learn to write plugins, please can you point me to docks where to learn

Thank you

ninono12345 avatar Feb 12 '24 18:02 ninono12345

@ninono12345 , it is safe to ignore those warnings. Usually those conv/gemm layers dominated the perf, we could already get decent perf after quantize those layers. To further improve the performance, we can check the trtexec --dumpLayerInfo --separateProfileRun --dumpProfile output. For plugins the opensource code are in https://github.com/NVIDIA/TensorRT/tree/release/8.6/plugin, document in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#extending

ttyio avatar Feb 12 '24 19:02 ttyio

@ttyio thank you very much, I will check on these

ninono12345 avatar Feb 13 '24 07:02 ninono12345

closing since no activity for more than 3 weeks per our policy, thanks all!

ttyio avatar May 07 '24 18:05 ttyio