ernie3.0量化过程报错:Hint: Expected dtype() == paddle::experimental::CppTypeToDataType<T>::Type()
欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:
- 版本、环境信息 1)PaddleNLP和PaddlePaddle版本:PaddleNLP 2.3.4,paddlepaddle-gpu 2.3.1.post116 2)系统环境:Windows10企业版,python38,cuda11.6,cudnn8.4
- 复现信息:ernie3.0模型量化出错,出错处最后调用的应该是c/c++编译的包了,无法继续排查了,错误信息如下图:
- Traceback (most recent call last):
File "compress_msra_ner.py", line 149, in <module>
main()
File "compress_msra_ner.py", line 142, in main
trainer.compress(output_dir,
File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\compress_trainer.py", line 179, in compress
self.quant(original_inference_model_dir, output_dir,
File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\compress_trainer.py", line 201, in quant
_post_training_quantization_grid_search(eval_dataloader, self.eval_dataset,
File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\compress_trainer.py", line 623, in _post_training_quantization_grid_search
_post_training_quantization(algo, batch_size)
post_training_quantization.quantize()
File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\contrib\slim\quantization\post_training_quantization.py", line 379, in quantize
self._executor.run(program=self._program,
File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 1300, in run
six.reraise(*sys.exc_info())
File "C:\Users\admin\AppData\Roaming\Python\Python38\site-packages\six.py", line 719, in reraise
raise value
File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 1286, in run
res = self._run_impl(
File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 1467, in _run_impl
return new_exe.run(list(feed.keys()), fetch_list, return_numpy)
File "C:\Program Files\Python38\lib\site-packages\paddle\fluid\executor.py", line 547, in run
tensors = self._new_exe.run(feed_names, fetch_list)._move_to_list()
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType<T>::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType<T>::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)
看 报错信息应该具体是这几行,看起来是数据的dtype不匹配
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)
可以先检查一下模型的输入需要的dtype,和dataset/data_loader出来的数据的dtype是否匹配,常见的有int32和int64等~
万分感谢,改成int32可以了!
1、data_loader出来的type: {'input_ids': Tensor(shape=[32, 127], dtype=int64, place=Place(gpu:0), stop_gradient=True 2、裁剪的时候dtype配置为int32: elif quantization: input_dir = compress_config.quantization_config.input_dir if input_dir is None: compress_config.quantization_config.input_filename_prefix = "model" input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] 3、量化成功 4、打开 set_dynamic_shape 开关,自动配置动态shape出现新问题,看样子还是那个int64问题: python infer_gpu.py --task_name token_cls --model_path ./msra_ner_quant_infer_model/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape 错误如下: Traceback (most recent call last): File "./deploy/python/infer_gpu.py", line 94, in main() File "./deploy/python/infer_gpu.py", line 82, in main predictor = ErniePredictor(args) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 296, in init self.set_dynamic_shape(args.max_seq_length, args.batch_size) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 405, in set_dynamic_shape self.inference_backend.infer(batch) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 203, in infer self.predictor.run() RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
我的dtype配置int32或int64也不行: `def token_cls_preprocess(self, data: list):
tokenizer + pad
is_split_into_words = False if isinstance(data[0], list): is_split_into_words = True data = self.tokenizer(data, max_length=self.max_seq_length, padding=True, truncation=True, is_split_into_words=is_split_into_words)
input_ids = data["input_ids"]
token_type_ids = data["token_type_ids"]
return {
"input_ids": np.array(input_ids, dtype="int64"),
"token_type_ids": np.array(token_type_ids, dtype="int64")
}`
您好,set_dynamic_shape函数中用的是int64类型自己构造的数据,https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L379
您可以点开详细看一下
不是您的输入数据,可能需要您组网这里统一下dtype
您好,有点疑惑,请问不是我输入的数据指的是哪个地方输入的,组网统一dtype指的是在函数set_dynamic_shape里面统一吗?我之前尝试过修改set_dynamic_shape里面的dtype,但是出现同样的错误了 补充下:量化过程中出现如下告警,不知有没影响: Wed Aug 10 16:04:28-INFO: Collect quantized variable names ... Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: scale is not supported for quantization.
-
Q1: 请问不是我输入的数据指的是哪个地方输入的:
-
A1:是set_dynamic_shape它会构造数据,这个set_dynamic_shape过程用到的数据和你的输入数据无关,通过代码看它是构造的int64的数据: https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L384-L389
-
Q2:组网统一dtype
-
A2:还是需要保证网络希望的输入dtype和你实际给的数据的dtype一致,如果还是不成功,可以发来代码一起看一下
-
Q3: 量化过程中出现如下告警,不知有没影响:
-
A3: Warning应该是不会有影响的
模型训练:run_msra_ner.py python run_token_cls.py --task_name msra_ner --model_name_or_path ernie-3.0-medium-zh --do_train
裁剪: 1、compress_msra_ner.py 2、compress_trainer.py python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./
量化:裁剪步骤文件1compress设置:pruning=False, quantization=True,文件2修改dtype为int32(dtype设置为int64会出错):input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./
部署:ernie_preditctor.py python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape
部署发生错误: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
还有个gpu内存问题: 直接执行部署脚本跑裁剪后的模型,运行结束后gpu内存会释放掉: python infer_gpu.py --task_name token_cls --model_path ./msra_ner_pruned_infer_model/float32 但是,如果启动一个后台服务(http服务),用接口引入 infer_gpu.main执行,接口调用完后gpu内存不会释放,且调用一次叠加一次如:1g->2g...直到内存爆了
现在ernie3的部署只支持seq、token?
您好,抱歉回复不及时,您试试把compress_trainer.py中的onnx_format参数设为False,为True的情况目前可能还不支持,正在排查中了。
onnx_format设为False还是出错了
onnx_format设为False还是出错了
@Fmaj7 onnx_format设为False,然后重新导出量化模型,预测时的报错信息可以发下吗?
onnx_format=False,执行量化出错,如下:

看 报错信息应该具体是这几行,看起来是数据的dtype不匹配
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container. [Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)可以先检查一下模型的输入需要的dtype,和dataset/data_loader出来的数据的dtype是否匹配,常见的有int32和int64等~
@Fmaj7 看报错和这个一样,按照这样改下呢?
已经试过了,dtype设置为int32量化可以通过,设置为int64就报上面的错误,但是当设置为int32通过完成量化后,再执行:python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape,则出现以下错误: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
quantize_linear 这个算子是在onnx_format=True下出现的,你需要将dtype设置为int32,同时onnx_format=False
是的,dtype=int32,onnx_format=False可以通过量化(实际上我测试的时候只设置dtype=int32就通过量化了),但是上面--set_dynamic_shape又出错了,如下: RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]
这个问题可以先将您的ernie_predictor.py中的set_dynamic_shape方法中的int64也都改为int32,应该可以绕过
不行,前几天试过了,刚刚也试过,这个问题真困惑,是平台不兼容还是其他原因呢!

改用wsl测试,量化参数:dtype=int64,onnx_frmat=False可以通过量化,但执行--set_dynamic_shape还是不行,ernie_predictor.py里面setdynamic_shape中无论都是int64或int32都不行,错误信息同上
能够再提供下.pdmodel文件吗。因为 fake_quantize_dequantize_moving_average_abs_max 这个算子在 ERNIE模型下输入确实不应该是 int32
int8.zip 这个是量化输出的文件,量化compress_train.py参数:dtype=int64,onnx_frmat=False
请确认将 onnx_format=False,应该是compress_trainer.py这个文件里
PostTrainingQuantization的初始化
您好,请问问题解决了吗?我也遇到相同的问题了
您好,请问问题解决了吗?我也遇到相同的问题了
您好,把报错截图发出来一起看一下吧
我用的是paddleslim的自动压缩,压缩的策略是执行的离线量化。报的类似的错误。

好像没有解决,我后面用的是in-batch-negative,然后做paddle serving部署,没有做压缩了,检索速度还是蛮快的,gpu训练模型,cpu上面跑检索速度0.3s左右
请问问题解决了没有,也遇到相似的问题。 paddle训练的模型直接进行量化操作(onnx_format=True, dtype=int64),得到量化模型后,进行推理时报错: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
改为(onnx_format=False, dtype=int64)得到量化模型后推理报错: RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)];place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。