TensorRT
TensorRT copied to clipboard
Unable to convert an onnx model to tensorrt using int8 with calibration set
Description
Hi, I was going to use polygraphys converter to tensorrt and calibrator, but this model uses InstanceNormalization and the onnx parser flag has to be set: parser.set_flag(trt.OnnxParserFlag.NATIVE_INSTANCENORM), so I am trying to do everything with tensorrt code. This is my code:
` import pycuda.driver as cuda import pycuda.autoinit
import numpy as np
import onnx
import tensorrt as trt
import torch
calset = torch.load("cs.pt")
def calibration_data_stream():
for i in range(len(calset)):
im_patches = calset[i][0].cpu().numpy()
train_feat = calset[i][1].cpu().numpy()
target_labels = calset[i][2].cpu().numpy()
train_ltrb = calset[i][3].cpu().numpy()
yield [im_patches, train_feat, target_labels, train_ltrb]
class EntropyCalibrator2(trt.IInt8EntropyCalibrator2):
def __init__(self, calibration_stream, cache_file):
# input_layers: a list of dictionaries containing names and shapes of the input layers
# cache_file: path to save calibration cache
super(EntropyCalibrator2, self).__init__()
self.calibration_stream = calibration_stream
self.cache_file = cache_file
self.batch_size = 1
self.current_index = 0
self.device_input_buffers = [] # To hold device input buffers
self.allocate_buffers()
def allocate_buffers(self):
for tensors in next(iter(self.calibration_stream)):
for tensor in tensors:
volume = trt.volume(tensor.shape)
print("allocate_buffers")
print(volume)
print(tensor.nbytes)
# dtype = np.float32
self.device_input_buffers.append(cuda.mem_alloc(tensor.nbytes))
def get_batch_size(self):
return 1
def get_batch(self, names):
try:
for name in names:
print(name)
data = next(self.calibration_stream)
for input_tensor, b in zip(data, self.device_input_buffers):
# if name not in self.device_input_buffers:
# raise ValueError(f"Buffer for {name} not allocated")
if not isinstance(input_tensor, np.ndarray) or input_tensor.dtype != np.float32:
raise TypeError("Input tensor must be a np.ndarray with dtype np.float32")
# if np.prod(input_tensor.shape) * input_tensor.dtype.itemsize != b.size:
# raise ValueError("Input tensor size does not match the allocated buffer size")
cuda.memcpy_htod(b, np.ascontiguousarray(input_tensor))
print("get batch")
print(type(b))
print(b)
print(int(b))
return [int(b) for b in self.device_input_buffers]
except StopIteration:
return []
def read_calibration_cache(self):
try:
with open(self.cache_file, "rb") as f:
return f.read()
except:
return None
def write_calibration_cache(self, cache):
with open(self.cache_file, "wb") as f:
f.write(cache)
calibration_data_stream_gen = calibration_data_stream()
calibrator = EntropyCalibrator2(calibration_data_stream_gen, "calibration_cache.bin")
input_layers = [
{'name': 'im_patches', 'shape': (1, 3, 288, 288)},
{'name': 'train_feat', 'shape': (1, 256, 18, 18)},
{'name': 'target_labels', 'shape': (1, 1, 18, 18)},
{'name': 'train_ltrb', 'shape': (1, 4, 18, 18)}
]
# Constants
ONNX_MODEL_PATH = 'new_full_explicit_batch32.onnx'
TENSORRT_ENGINE_PATH = 'new_full_explicit_batch32.engine'
# ONNX_MODEL_PATH = 'new_full_implicit_batch_16.onnx'
# TENSORRT_ENGINE_PATH = 'new_full_implicit_batch_16.engine'
MIN_BATCH_SIZE = 1
MAX_BATCH_SIZE = 32
# Set up the logger
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
# Create a TensorRT builder, runtime, and network
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.set_flag(trt.OnnxParserFlag.NATIVE_INSTANCENORM)
# Parse the ONNX model file
with open(ONNX_MODEL_PATH, 'rb') as model:
if not parser.parse(model.read()):
print('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print(parser.get_error(error))
exit(1)
# Define optimization profile for dynamic batch size
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = calibrator
profile = builder.create_optimization_profile()
profile.set_shape('im_patches', (MIN_BATCH_SIZE, 3, 288, 288), (MAX_BATCH_SIZE, 3, 288, 288), (MAX_BATCH_SIZE, 3, 288, 288))
profile.set_shape('train_feat', (MIN_BATCH_SIZE, 256, 18, 18), (MAX_BATCH_SIZE, 256, 18, 18), (MAX_BATCH_SIZE, 256, 18, 18))
profile.set_shape('target_labels', (1, MIN_BATCH_SIZE, 18, 18), (1, MAX_BATCH_SIZE, 18, 18), (1, MAX_BATCH_SIZE, 18, 18))
profile.set_shape('train_ltrb', (MIN_BATCH_SIZE, 4, 18, 18), (MAX_BATCH_SIZE, 4, 18, 18), (MAX_BATCH_SIZE, 4, 18, 18))
config.add_optimization_profile(profile)
# Build the engine
engine = builder.build_serialized_network(network, config)
# Save the engine
with open(TENSORRT_ENGINE_PATH, 'wb') as f:
f.write(engine)
`
and I am getting this error:
[02/08/2024-03:36:43] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[02/08/2024-03:36:43] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[02/08/2024-03:36:43] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[02/08/2024-03:36:43] [TRT] [W] Calibration Profile is not defined. Calibrating with Profile 0
[02/08/2024-03:37:01] [TRT] [E] 1: [genericReformat.cu::genericReformat::executeMemcpy::1583] Error Code 1: Cuda Runtime (invalid argument)
[02/08/2024-03:37:02] [TRT] [E] 3: [engine.cpp::nvinfer1::rt::Engine::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: engine.cpp::nvinfer1::rt::Engine::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior.
)
[02/08/2024-03:37:02] [TRT] [E] 2: [calibrator.cpp::nvinfer1::builder::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
Traceback (most recent call last):
File "D:\pyth\pytracking-master2\pytracking\band13.py", line 131, in
the first warning that I don't like is that Calibration profile is not defined, after which folows executeMemcpy, which I assume is with cuda.memcpy_htod? correct me if I'm wrong...
My journey with tensorrt is a very difficult one, because there is no explanation anywhere, the documentation is very vague docs and I seemed to follow the examples, but sadly I cannot get this right...
onnx model link: https://drive.google.com/file/d/1ajZQShdSqj1IEHNQFa5Z0I5keiBHagsK/view?usp=sharing
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: GTX 1660 Ti
NVIDIA Driver Version: 546.01
CUDA Version: 12.1
CUDNN Version: 8.9.7
Operating System: Windows 10
Python Version (if applicable): 3.10.13
PyTorch Version (if applicable): 2.1.2+cu121
Could you please try TRT 9.2/9.3? I check our latest internal release and found calibration is success with polygraphy convert new_full_explicit_batch32.onnx --int8 -o out.plan, so this seems to be a fixed issue.
@zerollzeng thank you for your answer, I just wanted to write that I've done a mistake in my code, now it successfully converted in tensorrt 8.6 just changed the dataset and now everything is ok
@zerollzeng Now I'm getting this, a lot of layers are not converted to int8, how can I fix that?:
[02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 0) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 153) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 154) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 156) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 157) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 160) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 161) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /Transpose_4_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 179) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 181) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 182) [Convolution]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 185) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 186) [Convolution]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 189) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 190) [Convolution]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 195) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 198) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 200) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 201) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 203) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 204) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 206) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 207) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 209) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 210) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 212) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 213) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 215) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 216) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 223) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 224) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 228) [Softmax]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /transformer/encoder/layers.0/self_attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 233) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 234) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 235) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 236) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 240) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 241) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 242) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 243) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 245) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 246) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 248) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 249) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 252) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 253) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 255) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 256) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 259) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 260) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 261) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 262) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 265) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 266) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 268) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 269) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 271) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 272) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 274) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 275) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 277) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 278) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 280) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 281) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 288) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 289) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 293) [Softmax]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /transformer/encoder/layers.1/self_attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 298) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 299) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 300) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 301) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 305) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 306) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 307) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 308) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 310) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 311) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 313) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 314) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 317) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 318) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 320) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 321) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 324) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 325) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 326) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 330) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 333) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 334) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 336) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 337) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 339) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 340) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 342) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 343) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 345) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 346) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 353) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 354) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 358) [Softmax]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor /transformer/encoder/layers.2/self_attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 363) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 364) [Matrix Multiply]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 366) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 371) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 372) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 373) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 375) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 376) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 378) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 379) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 389) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 390) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 391) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 392) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 395) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 396) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 398) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 399) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 401) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 402) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 404) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 405) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [02/12/2024-00:29:48] [TRT] [W] Missing scale and zero-point for tensor (Unnamed Layer* 407) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing
@ttyio, I noticed another post about this issue, and I think you can help.
there are about 3 times more layers that were unable to convert to int8. Can you suggest how should I approach this? Can I write plugins so these layers are implemented? Looking at this I think that there is only a few unsupported layers that are used a lot of times... Can you suggest me how should I fix this, because out of 1500 more than 500 layers don't have int8 implementation. If I must learn to write plugins, please can you point me to docks where to learn
Thank you
@ninono12345 , it is safe to ignore those warnings. Usually those conv/gemm layers dominated the perf, we could already get decent perf after quantize those layers. To further improve the performance, we can check the trtexec --dumpLayerInfo --separateProfileRun --dumpProfile output.
For plugins the opensource code are in https://github.com/NVIDIA/TensorRT/tree/release/8.6/plugin, document in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#extending
@ttyio thank you very much, I will check on these
closing since no activity for more than 3 weeks per our policy, thanks all!