欢迎您反馈PaddleNLP使用问题，非常感谢您对PaddleNLP的贡献！在留下您的问题时，辛苦您同步提供如下信息：

版本、环境信息 1）PaddleNLP和PaddlePaddle版本：PaddleNLP 2.3.4，paddlepaddle-gpu 2.3.1.post116 2）系统环境：Windows10企业版，python38，cuda11.6，cudnn8.4
复现信息：ernie3.0模型量化出错，出错处最后调用的应该是c/c++编译的包了，无法继续排查了，错误信息如下图：

- Traceback (most recent call last):
  File "compress_msra_ner.py", line 149, in <module>
    main()
  File "compress_msra_ner.py", line 142, in main
    trainer.compress(output_dir,
  File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\compress_trainer.py", line 179, in compress
    self.quant(original_inference_model_dir, output_dir,
  File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\compress_trainer.py", line 201, in quant
    _post_training_quantization_grid_search(eval_dataloader, self.eval_dataset,
  File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\compress_trainer.py", line 623, in _post_training_quantization_grid_search
    _post_training_quantization(algo, batch_size)
    post_training_quantization.quantize()
  File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\contrib\slim\quantization\post_training_quantization.py", line 379, in quantize
    self._executor.run(program=self._program,
  File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 1300, in run
    six.reraise(*sys.exc_info())
  File "C:\Users\admin\AppData\Roaming\Python\Python38\site-packages\six.py", line 719, in reraise
    raise value
  File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 1286, in run
    res = self._run_impl(
  File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 1467, in _run_impl
    return new_exe.run(list(feed.keys()), fetch_list, return_numpy)
  File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 547, in run
    tensors = self._new_exe.run(feed_names, fetch_list)._move_to_list()
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
  [Hint: Expected dtype() == paddle::experimental::CppTypeToDataType<T>::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType<T>::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)

Aug 08 '22 09:08 Fmaj7

看报错信息应该具体是这几行，看起来是数据的dtype不匹配

ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)

可以先检查一下模型的输入需要的dtype，和dataset/data_loader出来的数据的dtype是否匹配，常见的有int32和int64等~

Aug 08 '22 11:08 LiuChiachi

万分感谢，改成int32可以了！

Aug 09 '22 00:08 Fmaj7

1、data_loader出来的type： {'input_ids': Tensor(shape=[32, 127], dtype=int64, place=Place(gpu:0), stop_gradient=True 2、裁剪的时候dtype配置为int32： elif quantization: input_dir = compress_config.quantization_config.input_dir if input_dir is None: compress_config.quantization_config.input_filename_prefix = "model" input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] 3、量化成功 4、打开 set_dynamic_shape 开关，自动配置动态shape出现新问题，看样子还是那个int64问题： python infer_gpu.py --task_name token_cls --model_path ./msra_ner_quant_infer_model/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape 错误如下： Traceback (most recent call last): File "./deploy/python/infer_gpu.py", line 94, in main() File "./deploy/python/infer_gpu.py", line 82, in main predictor = ErniePredictor(args) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 296, in init self.set_dynamic_shape(args.max_seq_length, args.batch_size) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 405, in set_dynamic_shape self.inference_backend.infer(batch) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 203, in infer self.predictor.run() RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

我的dtype配置int32或int64也不行： `def token_cls_preprocess(self, data: list):

tokenizer + pad

is_split_into_words = False if isinstance(data[0], list): is_split_into_words = True data = self.tokenizer(data, max_length=self.max_seq_length, padding=True, truncation=True, is_split_into_words=is_split_into_words)

input_ids = data["input_ids"]
token_type_ids = data["token_type_ids"]
return {
    "input_ids": np.array(input_ids, dtype="int64"),
    "token_type_ids": np.array(token_type_ids, dtype="int64")
}`

Aug 09 '22 01:08 Fmaj7

您好，set_dynamic_shape函数中用的是int64类型自己构造的数据，https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L379 您可以点开详细看一下不是您的输入数据，可能需要您组网这里统一下dtype

Aug 10 '22 06:08 LiuChiachi

您好，有点疑惑，请问不是我输入的数据指的是哪个地方输入的，组网统一dtype指的是在函数set_dynamic_shape里面统一吗？我之前尝试过修改set_dynamic_shape里面的dtype，但是出现同样的错误了补充下：量化过程中出现如下告警，不知有没影响： Wed Aug 10 16:04:28-INFO: Collect quantized variable names ... Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: scale is not supported for quantization.

Aug 10 '22 07:08 Fmaj7

Q1: 请问不是我输入的数据指的是哪个地方输入的:
A1:是set_dynamic_shape它会构造数据，这个set_dynamic_shape过程用到的数据和你的输入数据无关，通过代码看它是构造的int64的数据： https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L384-L389
Q2:组网统一dtype
A2:还是需要保证网络希望的输入dtype和你实际给的数据的dtype一致，如果还是不成功，可以发来代码一起看一下
Q3: 量化过程中出现如下告警，不知有没影响：
A3: Warning应该是不会有影响的

Aug 10 '22 13:08 LiuChiachi

模型训练：run_msra_ner.py python run_token_cls.py --task_name msra_ner --model_name_or_path ernie-3.0-medium-zh --do_train

裁剪： 1、compress_msra_ner.py 2、compress_trainer.py python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./

量化：裁剪步骤文件1compress设置：pruning=False, quantization=True，文件2修改dtype为int32（dtype设置为int64会出错）：input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./

部署：ernie_preditctor.py python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape

部署发生错误： RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

还有个gpu内存问题：直接执行部署脚本跑裁剪后的模型，运行结束后gpu内存会释放掉： python infer_gpu.py --task_name token_cls --model_path ./msra_ner_pruned_infer_model/float32 但是，如果启动一个后台服务（http服务），用接口引入 infer_gpu.main执行，接口调用完后gpu内存不会释放，且调用一次叠加一次如：1g->2g...直到内存爆了

现在ernie3的部署只支持seq、token？

Aug 11 '22 01:08 Fmaj7

您好，抱歉回复不及时，您试试把compress_trainer.py中的onnx_format参数设为False，为True的情况目前可能还不支持，正在排查中了。

Aug 15 '22 03:08 LiuChiachi

onnx_format设为False还是出错了

Aug 15 '22 03:08 Fmaj7

onnx_format设为False还是出错了

Aug 15 '22 03:08 Fmaj7

@Fmaj7 onnx_format设为False，然后重新导出量化模型，预测时的报错信息可以发下吗？

Aug 15 '22 05:08 yghstill

onnx_format=False，执行量化出错，如下： }C8JML4XIDD1)18U70I K3

Aug 15 '22 06:08 Fmaj7

看报错信息应该具体是这几行，看起来是数据的dtype不匹配
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)
可以先检查一下模型的输入需要的dtype，和dataset/data_loader出来的数据的dtype是否匹配，常见的有int32和int64等~

@Fmaj7 看报错和这个一样，按照这样改下呢？

Aug 15 '22 06:08 yghstill

已经试过了，dtype设置为int32量化可以通过，设置为int64就报上面的错误，但是当设置为int32通过完成量化后，再执行：python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape，则出现以下错误： RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

Aug 15 '22 06:08 Fmaj7

quantize_linear 这个算子是在onnx_format=True下出现的，你需要将dtype设置为int32，同时onnx_format=False

Aug 15 '22 12:08 yghstill

是的，dtype=int32，onnx_format=False可以通过量化（实际上我测试的时候只设置dtype=int32就通过量化了），但是上面--set_dynamic_shape又出错了，如下： RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]

Aug 15 '22 13:08 Fmaj7

这个问题可以先将您的ernie_predictor.py中的set_dynamic_shape方法中的int64也都改为int32，应该可以绕过

Aug 16 '22 02:08 LiuChiachi

不行，前几天试过了，刚刚也试过，这个问题真困惑，是平台不兼容还是其他原因呢！ UI(6O6{IKPZMFAEH36FSC2

Aug 16 '22 02:08 Fmaj7

改用wsl测试，量化参数：dtype=int64，onnx_frmat=False可以通过量化，但执行--set_dynamic_shape还是不行，ernie_predictor.py里面setdynamic_shape中无论都是int64或int32都不行，错误信息同上

Aug 17 '22 05:08 Fmaj7

能够再提供下.pdmodel文件吗。因为 fake_quantize_dequantize_moving_average_abs_max 这个算子在 ERNIE模型下输入确实不应该是 int32

Aug 17 '22 05:08 LiuChiachi

int8.zip 这个是量化输出的文件，量化compress_train.py参数：dtype=int64，onnx_frmat=False

Aug 17 '22 05:08 Fmaj7

请确认将 onnx_format=False，应该是compress_trainer.py这个文件里 PostTrainingQuantization的初始化

Aug 22 '22 08:08 LiuChiachi

您好，请问问题解决了吗？我也遇到相同的问题了

Oct 11 '22 06:10 Renxs177

您好，请问问题解决了吗？我也遇到相同的问题了

您好，把报错截图发出来一起看一下吧

Oct 11 '22 07:10 LiuChiachi

我用的是paddleslim的自动压缩，压缩的策略是执行的离线量化。报的类似的错误。

Oct 11 '22 08:10 Renxs177

好像没有解决，我后面用的是in-batch-negative，然后做paddle serving部署，没有做压缩了，检索速度还是蛮快的，gpu训练模型，cpu上面跑检索速度0.3s左右

Oct 11 '22 09:10 Fmaj7

请问问题解决了没有，也遇到相似的问题。 paddle训练的模型直接进行量化操作（onnx_format=True, dtype=int64），得到量化模型后，进行推理时报错： RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

改为(onnx_format=False, dtype=int64）得到量化模型后推理报错： RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)];place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]

Nov 23 '22 09:11 tianjiahao

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

Jan 23 '23 00:01 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。

Feb 06 '23 00:02 github-actions[bot]

ernie3.0量化过程报错：Hint: Expected dtype() == paddle::experimental::CppTypeToDataType<T>::Type()

tokenizer + pad