Clip layer upper bound not respected by TRT 10.x in MatMul->Add->Clip chains
Description
While troubleshooting a severe performance degradation of our model after conversion from ONNX to TRT (using fp32 precision and default settings), we noticed the following apparent bug in TensorRT:
Whenever there is a chain of MatMul->Add->Clip, TensorRT 10.x appears to incorrectly ignore any upper bound (=max) specified within the Clip layer if the lower bound (=min) is set to 0; i.e. in such cases, it erroneously replaces the Clip by a positively unbounded ReLU (i.e. max=+inf).
In practice this happens, for example, when a torch.nn.ReLU6 activation follows a torch.nn.Linear layer.
Most likely this is due to a missing check for the existence of an upper Clip bound within TensorRT when such combination of layers is detected (and the lower bound is 0).
The result is a massive reduction in accuracy of our model, without using any quantization.
Environment
TensorRT Version: both in 10.13.3.9 & 10.12.0.36
NVIDIA GPU: A100, H200
NVIDIA Driver Version: 575.51.03
CUDA Version: 12.8
CUDNN Version: 9.8
Torch Version: 2.8.0
Operating System: Linux
Minimum failing example
The following exemplary minimum ONNX graph (opset 13 or higher) reproduces the issue:
matmul_input (2x3) --+
|
matmul_weight (3x4) --+--> MatMul --+
|
+--> Add --+
add_bias (4) ------------------------+ |
|
+--> Clip (min=0, max=6) --> clip_output (2x4)
Note that min and max are constant scalar input tensors, passed as initializers to the graph.
The shape of the other input tensors does not seem to matter, in our case the issue occurred also with larger tensors.
Steps To Reproduce
Commands or scripts: To reproduce the issue on such a minimum example, it is sufficient to run polygraphy with random inputs, e.g. using the following command line:
polygraphy run --trt --onnxrt minimum_example.onnx --val-range [-10,10] --iterations 100
The polygraphy raw output will show that for the ONNX RT variant, the output tensor clip_output is properly clipped at the upper bound (here: 6), whereas for the TRT runner, the upper bound is disrespected (and thus e.g. mean and max stats exceed the upper bound). The comparison fails.
Have you tried the latest release?: Yes, issue persists
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system n/a
Can this model run on other frameworks? With ONNX RT instead of TRT runner, the issue does not occur.
Example polygraphy output
[I] Comparing Output: 'clip_output' (dtype=float32, shape=(2, 4)) with 'clip_output' (dtype=float32, shape=(2, 4))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-10/27/25-14:23:30: clip_output | Stats: mean=31.249, std-dev=46.829, var=2192.9, median=11.64, min=0 at (0, 0), max=146.4 at (1, 3), avg-magnitude=31.249, p90=82.23, p95=114.32, p99=139.98
[I] ---- Values ----
[[ 0. 12.467738 10.812315 54.727707]
[ 0. 25.580246 0. 146.40123 ]]
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0 , 14.6) | 5 | ########################################
(14.6, 29.3) | 1 | ########
(29.3, 43.9) | 0 |
(43.9, 58.6) | 1 | ########
(58.6, 73.2) | 0 |
(73.2, 87.8) | 0 |
(87.8, 102 ) | 0 |
(102 , 117 ) | 0 |
(117 , 132 ) | 0 |
(132 , 146 ) | 1 | ########
[I] onnxrt-runner-N0-10/27/25-14:23:30: clip_output | Stats: mean=3.75, std-dev=2.9047, var=8.4375, median=6, min=0 at (0, 0), max=6 at (0, 1), avg-magnitude=3.75, p90=6, p95=6, p99=6
[I] ---- Values ----
[[0. 6. 6. 6.]
[0. 6. 0. 6.]]
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0 , 14.6) | 8 | ########################################
(14.6, 29.3) | 0 |
(29.3, 43.9) | 0 |
(43.9, 58.6) | 0 |
(58.6, 73.2) | 0 |
(73.2, 87.8) | 0 |
(87.8, 102 ) | 0 |
(102 , 117 ) | 0 |
(117 , 132 ) | 0 |
(132 , 146 ) | 0 |
...
[E] FAILED | Output: 'clip_output' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E] FAILED | Mismatched outputs: ['clip_output']
[E] Accuracy Summary | trt-runner-N0-10/27/25-14:23:30 vs. onnxrt-runner-N0-10/27/25-14:23:30 | Passed: 0/100 iterations | Pass Rate: 0.0%