onnxruntime NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name 'Conv_0

Describe the issue

Following the documentation, I dynamically quantiized a resnet based model. The model is quantized and saved without error. However, when I try to create an inference session using the quantized model, the code crashes with the following error.

>>> ort_session = ort.InferenceSession(int8_path, providers=['CPUExecutionProvider'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chinmay/anaconda3/envs/v_pytorch2/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/chinmay/anaconda3/envs/v_pytorch2/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 408, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name 'Conv_0_quant'

This is a duplicate of #12558 , which was closed a few months ago so I guess the support should have been added in onnxruntime, but I am still getting the same error.

To reproduce

import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'path/to/the/model.onnx'
model_quant = 'path/to/the/model.quant.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant)

import onnxruntime as ort
ort_session = ort.InferenceSession(model_quant, providers=['CPUExecutionProvider'])

fp32_model quantized_model

Urgency

Not very urgent, but not too low priority also.

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

May 10 '23 14:05 chinmayjog13

I am getting the same issue. I might try reverting to a previous build just to see if it was working before.

May 22 '23 18:05 jpg-gamepad

I just tried the previous releases, they didn't work. Upon inspecting the code, it looks as if the patch may have never gotten deployed to release. Here and here

There should be a line that say class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 10, uint8_t_int8_t, ConvInteger);

Maybe @jchen351 knows what happened.

May 22 '23 19:05 jpg-gamepad

I'm having the same issue. Can I help get the patch applied?

Jun 13 '23 12:06 CrisLS

Same issue here. Do we have a timeline for the patch?

Jun 17 '23 01:06 johnyquest7

On the similar issue #3130, there is comment that may temporarily solve the issue. But when we change weight_type=QuantType.QInt8 to QuantType.QUInt8, the onnx quantized model seems to perform slower, as also mentioned in that issue. And that also happened to me.

Beside, I think these issues happen due to onnx does not support ConvInteger for signed Int8 datatype, here.

Jun 26 '23 08:06 trnhattan

In ONNX Runtime docs, in Method Selection subsection, they said that:

In general, it is recommended to use dynamic quantization for RNNs and transformer-based models, and static quantization for CNN models.

So I tried to follow the an end-to-end example given in the docs, and somehow it worked, the weights are int8. Here are steps that I did:

Convert FaceNet-InceptionResNet to ONNX model.
Create CalibrationDataReader by using some facial images.
Execute quantize_static()

Jun 27 '23 01:06 trnhattan

I have the same problem. I am finding that onnxruntime does not support the ConvInteger layer unfortunately, that means dynamic quantization is not working in onnxruntime if initial model has CNNs inside. Very sad!

Aug 06 '23 16:08 Apisteftos

I found a workaround to solve this, by setting:

quantize_dynamic(input_model, output_model, weight_type=QuantType.QInt8, nodes_to_exclude=['/conv1/Conv'])

Here, nodes_to_exclude should be the list of conv layer name in your model. You may find it in the error message when loading the model using InferenceSession. For example, for the error message mentioned in the title of this issue, it would be: 'Conv_0_quant'.

Aug 09 '23 15:08 greyovo

Another workaround is to exclude all operators causing the issue. For example:

In:

quantize_dynamic(input_model, output_model, weight_type=QuantType.QInt8, operators_to_quantize=['MatMul', 'Attention', 'LSTM', 'Gather', 'Transpose', 'EmbedLayerNormalization'])

"Conv" was removed from operators_to_quantize.

Aug 24 '23 20:08 tommate

I'm also facing the same error tired solving this problem.

Sep 05 '23 02:09 Ibrah-N

Same issue for segformer model NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name '/segformer/encoder/patch_embeddings.0/proj/Conv_quant'

Nov 13 '23 17:11 ogencoglu

I just hit this issue too. I will skip quantizing Conv operators for now.

Nov 13 '23 18:11 theonewolf

same error, it only works with QUint8 but not QInt8

Dec 14 '23 22:12 hu8813

I found a workaround to solve this, by setting:
quantize_dynamic(input_model, output_model, weight_type=QuantType.QInt8, nodes_to_exclude=['/conv1/Conv'])
Here, nodes_to_exclude should be the list of conv layer name in your model. You may find it in the error message when loading the model using InferenceSession. For example, for the error message mentioned in the title of this issue, it would be: 'Conv_0_quant'.

Hi, I tried this while trying to quantize a whisper model (seq2seq) (see here) however I am getting the following error

NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'

I tried to use nodes_to_exclude in my AutoQuantizationConfig to exclude the node but the error is still the same

qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False, nodes_to_exclude=['/conv1/Conv_quant'])

Any help would be appreciated! 🙏

May 22 '24 03:05 joelleoqiyi

I had to better describe the operators to quantize and the following worked for me:

dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
dqconfig.nodes_to_exclude = ['Conv_quant']
dqconfig.operators_to_quantize=['MatMul', 'Attention', 'LSTM', 'Gather', 'Transpose', 'EmbedLayerNormalization']

Hope this helps, Thomas

May 22 '24 06:05 tommate

still only works with QUInt8 for me

May 24 '24 15:05 moritzsur

@ogencoglu

werer you able to solve the issue. I am facing the same with quantization and also i have more inference time with ONNX can be seen here any help will be appreciated

Jun 20 '24 11:06 Fazankabir

@fazankabir No solution found so far.

Jun 20 '24 12:06 ogencoglu

I am able to do quantization with:

model_fp32 = 'model_Segformer.onnx'  
model_quant = "model_dynamic_quant.onnx"
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8)

instead of QuantType.QInt8

but while doing the inference it is taking even more time. do you also have the issue with more inference time after exporting the segformer to onnx @ogencoglu ?

Jun 20 '24 12:06 Fazankabir

I haven't tested that (onnx is a must for my case) but quantized onnx model has longer inference time than full precision one. So I ditched segformer all together. @Fazankabir

Jun 20 '24 14:06 ogencoglu

I am able to do quantization with:
model_fp32 = 'model_Segformer.onnx'  
model_quant = "model_dynamic_quant.onnx"
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8) 
instead of QuantType.QInt8

but while doing the inference it is taking even more time. do you also have the issue with more inference time after exporting the segformer to onnx @ogencoglu ?

Yeh you right, a lot of operators in onnx are running in CPU instead of GPU after being quantized (maybe they haven't supported yet), so it might take more time to run some quantized model

Jun 20 '24 14:06 mattam301

Any news about this topic? I have the same issue on this code:

import onnx
import onnxruntime as ort
import numpy as np

model = onnx.parser.parse_model(
    """
    <ir_version: 5, opset_import: [ "" : 18 ]>
    agraph (int8[N, 3, 32, 32] X) => (int32[N, 16, ?, ?] Y){
        Y = ConvInteger(X, W)
    }
    """
)
model.graph.initializer.extend(
    [onnx.numpy_helper.from_array(np.ones((16, 3, 3, 3), "int8"), "W")]
)

# Inference
feed = {"X": np.ones((1, 3, 32, 32), "int8")}
sess = ort.InferenceSession(model.SerializeToString(), providers=["CPUExecutionProvider"])

# Error is raised here
y = sess.run(None, feed)[0]

Oct 02 '24 20:10 Johansmm

run

quantized_model = quantize_dynamic(
    model_path,
    quantized_model_path,
    weight_type=QuantType.QInt8,
    nodes_to_exclude=[],
)
import onnxruntime as ort

ort_session = ort.InferenceSession(
    quantized_model_path, providers=["CPUExecutionProvider"]
)

find node

.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED :
 Could not find an implementation for ConvInteger(10) node with name '/model/patch_embed/proj/Conv_quant'

nodes_to_exclude

delete _quant

/model/patch_embed/proj/Conv_quant => /model/patch_embed/proj/Conv

quantized_model = quantize_dynamic(
    model_path,
    quantized_model_path,
    weight_type=QuantType.QInt8,
    nodes_to_exclude=[
    "/model/patch_embed/proj/Conv"  # /model/patch_embed/proj/Conv_quant => /model/patch_embed/proj/Conv
    ],
)
import onnxruntime as ort

ort_session = ort.InferenceSession(
    quantized_model_path, providers=["CPUExecutionProvider"]
)

Oct 19 '24 07:10 openmynet

I have a ResNet based model that I cannot quantize to any format except QuantType.QUInt8, I would like to quantize it to other formats also, but without excluding the Conv nodes. Also, has anyone managed to run the quantized model in GPU using CUDAExecutionProvider? Even if I define the CUDA provider first, it does the inference using the CPU provider, which takes more time to infer than the unquantized model... I would really appreciate a way to solve this.

Oct 30 '24 13:10 JulArr22

I fixed it by excluding the failed nodes. Most of the failures were in the IntergerConv nodes.

import onnx

model_fp16_path = "./models/tusimple_18_V1_fp32.onnx"
model = onnx.load(model_fp16_path)

conv_nodes = []
for node in model.graph.node:
    if node.op_type == "Conv":
        conv_nodes.append(node.name)
conv_nodes

# ['/model/conv1/Conv',
#  '/model/layer1/layer1.0/conv1/Conv',
#  '/model/layer1/layer1.0/conv2/Conv',
#  '/model/layer1/layer1.1/conv1/Conv',
#  '/model/layer1/layer1.1/conv2/Conv',
#  '/model/layer2/layer2.0/conv1/Conv',
#  '/model/layer2/layer2.0/conv2/Conv',
#  '/model/layer2/layer2.0/downsample/downsample.0/Conv',
#  '/model/layer2/layer2.1/conv1/Conv',
#  '/model/layer2/layer2.1/conv2/Conv',
#  '/model/layer3/layer3.0/conv1/Conv',
#  '/model/layer3/layer3.0/conv2/Conv',
#  '/model/layer3/layer3.0/downsample/downsample.0/Conv',
#  '/model/layer3/layer3.1/conv1/Conv',
#  '/model/layer3/layer3.1/conv2/Conv',
#  '/model/layer4/layer4.0/conv1/Conv',
#  '/model/layer4/layer4.0/conv2/Conv',
#  '/model/layer4/layer4.0/downsample/downsample.0/Conv',
#  '/model/layer4/layer4.1/conv1/Conv',
#  '/model/layer4/layer4.1/conv2/Conv',
#  '/pool/Conv']

from onnxruntime.quantization import quantize_dynamic, QuantType

model_int8_path = "./models/model_int8_dynamic.onnx"
quantized_model = quantize_dynamic(
    model_fp16_path,  # Input model
    model_int8_path,  # Output model
    weight_type=QuantType.QInt8,
    nodes_to_exclude=conv_nodes,
)
print("Dynamic Quantization Complete!")

Mar 09 '25 14:03 abaoxomtieu

And what is the inference speed @abaoxomtieu ? In my use case, partial quantization gets slower than non-quantized model on CPU.

Mar 09 '25 21:03 ogencoglu

And what is the inference speed @abaoxomtieu ? In my use case, partial quantization gets slower than non-quantized model on CPU.

It's faster than float16 on my laptop

Mar 11 '25 01:03 abaoxomtieu

onnxruntime onnxruntime copied to clipboard

NOT_IMPLEMENTED : Could not find an implementation for ConvInteger(10) node with name 'Conv_0_quant'

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

onnxruntime
onnxruntime copied to clipboard