PaddleNLP
PaddleNLP copied to clipboard
[Question]: UIE压缩到一半报错
使用如下命令压缩:
python finetune.py
--device cpu
--logging_steps 10
--save_steps 100
--eval_steps 100
--seed 42
--model_name_or_path ./checkpoint/model_best
--output_dir export
--train_path data/train.txt
--dev_path data/dev.txt
--max_seq_length 512
--per_device_eval_batch_size 16
--per_device_train_batch_size 1
--num_train_epochs 1
--learning_rate 1e-5
--do_compress True
--overwrite_output_dir
--disable_tqdm True
--metric_for_best_model eval_f1
--save_total_limit 1
--strategy 'qat' \
报错信息如下:
`[2022-11-04 07:04:34,524] [ INFO] - f1: 0.6206896551724138, precision: 0.782608695652174, recall: 0.6206896551724138
[2022-11-04 07:04:34,527] [ INFO] - eval done total: 41.88436722755432 s
[2022-11-04 07:05:16,472] [ INFO] - global step 510, epoch: 0, batch: 509, loss: 0.000004, speed: 0.12 step/s
[2022-11-04 07:05:58,469] [ INFO] - global step 520, epoch: 0, batch: 519, loss: 0.000007, speed: 0.24 step/s
[2022-11-04 07:06:39,883] [ INFO] - global step 530, epoch: 0, batch: 529, loss: 0.000008, speed: 0.24 step/s
[2022-11-04 07:07:22,352] [ INFO] - global step 540, epoch: 0, batch: 539, loss: 0.000049, speed: 0.24 step/s
[2022-11-04 07:08:07,073] [ INFO] - global step 550, epoch: 0, batch: 549, loss: 0.000022, speed: 0.22 step/s
[2022-11-04 07:08:17,428] [ INFO] - Best result: 0.6667
Traceback (most recent call last):
File "/usr/projects/uie-3496/finetune.py", line 342, in
File "/usr/projects/uie-3496/model.py", line 31, in forward
def forward(self, input_ids, token_type_ids, pos_ids=None, att_mask=None):
sequence_output, _ = self.encoder(input_ids=input_ids,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
token_type_ids=token_type_ids,
position_ids=pos_ids,
File "/home/icvip/.local/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/home/icvip/.local/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/tmp/tmpa2cqs75v.py", line 93, in auto_model_forward
] = paddle.jit.dy2static.convert_while_loop(for_loop_condition_0,
File "/home/icvip/.local/lib/python3.9/site-packages/paddle/fluid/dygraph/dygraph_to_static/convert_operators.py", line 45, in convert_while_loop
loop_vars = _run_py_while(cond, body, loop_vars)
File "/home/icvip/.local/lib/python3.9/site-packages/paddle/fluid/dygraph/dygraph_to_static/convert_operators.py", line 59, in _run_py_while
loop_vars = body(*loop_vars)
File "/tmp/tmpa2cqs75v.py", line 88, in for_loop_body_0
__for_loop_iter_var_0 = kwargs_keys[__for_loop_var_index_0]
TypeError: 'odict_keys' object is not subscriptable`
显存方面是否有溢出 ?
显存方面是否有溢出 ?
用cpu压缩的
显存方面是否有溢出 ?
你好,我也遇到了同样的问题,gpu上进行压缩,报了同样的错误,显存并没有溢出,我设定的epoch是100,在最后一个step时报了错
报错前的最后日志信息如下:
[2022-11-04 17:55:29,124] [ INFO] - global step 9690, epoch: 99, batch: 86, loss: 0.000001, speed: 4.03 step/s
[2022-11-04 17:55:31,612] [ INFO] - global step 9700, epoch: 99, batch: 96, loss: 0.000000, speed: 4.04 step/s
[2022-11-04 17:55:36,793] [ INFO] - f1: 0.6990881458966565, precision: 0.732484076433121, recall: 0.6990881458966565
[2022-11-04 17:55:36,795] [ INFO] - eval done total: 5.182710409164429 s
[2022-11-04 17:55:36,796] [ INFO] - Best result: 0.7130
Traceback (most recent call last):
File ".\finetune.py", line 289, in
您好~由于在我本地环境没有复现,所以二位可以提供更多的环境信息吗,比如paddlepaddle、paddlenlp、python的版本等,还有微调时使用的UIE预训练模型是?@starryzwh 报错信息方便的话也粘贴一下吧
看起来升级到2.4rc0的版本可以解决问题
您好~由于在我本地环境没有复现,所以二位可以提供更多的环境信息吗,比如paddlepaddle、paddlenlp、python的版本等,还有微调时使用的UIE预训练模型是?@starryzwh 报错信息方便的话也粘贴一下吧
您好,环境为win11、paddlepaddle-gpu==2.3.2、paddlenlp==2.4.1、 paddleslim==2.3.4、python==3.8
以下是详细报错信息:
`[2022-11-04 17:55:26,605] [ INFO] - global step 9680, epoch: 99, batch: 76, loss: 0.000004, speed: 4.01 step/s
[2022-11-04 17:55:29,124] [ INFO] - global step 9690, epoch: 99, batch: 86, loss: 0.000001, speed: 4.03 step/s
[2022-11-04 17:55:31,612] [ INFO] - global step 9700, epoch: 99, batch: 96, loss: 0.000000, speed: 4.04 step/s
[2022-11-04 17:55:36,793] [ INFO] - f1: 0.6990881458966565, precision: 0.732484076433121, recall: 0.6990881458966565
[2022-11-04 17:55:36,795] [ INFO] - eval done total: 5.182710409164429 s
[2022-11-04 17:55:36,796] [ INFO] - Best result: 0.7130
Traceback (most recent call last):
File ".\finetune.py", line 289, in
File "D:\PaddleNLP_UIE\model.py", line 31, in forward
def forward(self, input_ids, token_type_ids, pos_ids=None, att_mask=None):
sequence_output, _ = self.encoder(input_ids=input_ids,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
token_type_ids=token_type_ids,
position_ids=pos_ids,
File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "C:\Users\46383\AppData\Local\Temp\tmp377329ie.py", line 93, in auto_model_forward
] = paddle.jit.dy2static.convert_while_loop(for_loop_condition_0,
File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\convert_operators.py", line 45, in convert_while_loop
loop_vars = _run_py_while(cond, body, loop_vars)
File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\convert_operators.py", line 59, in _run_py_while
loop_vars = body(*loop_vars)
File "C:\Users\46383\AppData\Local\Temp\tmp377329ie.py", line 88, in for_loop_body_0
__for_loop_iter_var_0 = kwargs_keys[__for_loop_var_index_0]
TypeError: 'odict_keys' object is not subscriptable`
您好~由于在我本地环境没有复现,所以二位可以提供更多的环境信息吗,比如paddlepaddle、paddlenlp、python的版本等,还有微调时使用的UIE预训练模型是?@starryzwh 报错信息方便的话也粘贴一下吧
您好,环境为win11、paddlepaddle-gpu==2.3.2、paddlenlp==2.4.1、 paddleslim==2.3.4、python==3.8 以下是详细报错信息: `[2022-11-04 17:55:26,605] [ INFO] - global step 9680, epoch: 99, batch: 76, loss: 0.000004, speed: 4.01 step/s [2022-11-04 17:55:29,124] [ INFO] - global step 9690, epoch: 99, batch: 86, loss: 0.000001, speed: 4.03 step/s [2022-11-04 17:55:31,612] [ INFO] - global step 9700, epoch: 99, batch: 96, loss: 0.000000, speed: 4.04 step/s [2022-11-04 17:55:36,793] [ INFO] - f1: 0.6990881458966565, precision: 0.732484076433121, recall: 0.6990881458966565 [2022-11-04 17:55:36,795] [ INFO] - eval done total: 5.182710409164429 s [2022-11-04 17:55:36,796] [ INFO] - Best result: 0.7130 Traceback (most recent call last): File ".\finetune.py", line 289, in main() File ".\finetune.py", line 285, in main trainer.compress(custom_evaluate=custom_evaluate) File "C:\Users\46383\software\miniconda3\envs\paddle_test\lib\site-packages\paddlenlp\trainer\trainer_compress.py", line 91, in compress self.quant(args.output_dir, args.strategy) File "C:\Users\46383\software\miniconda3\envs\paddle_test\lib\site-packages\paddlenlp\trainer\trainer_compress.py", line 101, in quant _quant_aware_training_dynamic(self, model_dir) File "C:\Users\46383\software\miniconda3\envs\paddle_test\lib\site-packages\paddlenlp\trainer\trainer_compress.py", line 752, in _quant_aware_training_dynamic quanter.save_quantized_model(self.model, File "C:\Users\46383\software\miniconda3\envs\paddle_test\lib\site-packages\paddleslim\dygraph\quant\qat.py", line 289, in save_quantized_model self.imperative_qat.save_quantized_model( File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\contrib\slim\quantization\imperative\qat.py", line 273, in save_quantized_model self._quantize_outputs.save_quantized_model(layer, path, input_spec, File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\contrib\slim\quantization\imperative\qat.py", line 483, in save_quantized_model paddle.jit.save(layer=model, path=path, input_spec=input_spec, **config) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\jit.py", line 631, in wrapper func(layer, path, input_spec, **configs) File "C:\Users\46383\software\miniconda3\envs\paddle_test\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in impl return wrapped_func(*args, **kwargs) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\base.py", line 51, in impl return func(*args, **kwargs) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\jit.py", line 871, in save concrete_program = static_forward.concrete_program_specify_input_spec( File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\program_translator.py", line 527, in concrete_program_specify_input_spec concrete_program, _ = self.get_concrete_program( File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\program_translator.py", line 436, in get_concrete_program concrete_program, partial_program_layer = self._program_cache[cache_key] File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\program_translator.py", line 801, in getitem self._caches[item_id] = self._build_once(item) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\program_translator.py", line 785, in build_once concrete_program = ConcreteProgram.from_func_spec( File "C:\Users\46383\software\miniconda3\envs\paddle_test\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in impl return wrapped_func(*args, **kwargs) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\base.py", line 51, in impl return func(*args, **kwargs) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\program_translator.py", line 740, in from_func_spec error_data.raise_new_exception() File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\error.py", line 336, in raise_new_exception six.exec("raise new_exception from None") File "", line 1, in TypeError: In transformed code:
File "D:\PaddleNLP_UIE\model.py", line 31, in forward def forward(self, input_ids, token_type_ids, pos_ids=None, att_mask=None): sequence_output, _ = self.encoder(input_ids=input_ids, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE token_type_ids=token_type_ids, position_ids=pos_ids, File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__ return self._dygraph_call_func(*inputs, **kwargs) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "C:\Users\46383\AppData\Local\Temp\tmp377329ie.py", line 93, in auto_model_forward ] = paddle.jit.dy2static.convert_while_loop(for_loop_condition_0, File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\convert_operators.py", line 45, in convert_while_loop loop_vars = _run_py_while(cond, body, loop_vars) File "C:\Users\46383\AppData\Roaming\Python\Python38\site-packages\paddle\fluid\dygraph\dygraph_to_static\convert_operators.py", line 59, in _run_py_while loop_vars = body(*loop_vars) File "C:\Users\46383\AppData\Local\Temp\tmp377329ie.py", line 88, in for_loop_body_0 __for_loop_iter_var_0 = kwargs_keys[__for_loop_var_index_0] TypeError: 'odict_keys' object is not subscriptable`
升级paddle的版本到最新2.4rc版本
升级paddle的版本到最新2.4rc版本
您好,按照您说的,用conda install paddle paddlepaddle-gpu安装到了2.4.0rc0版本,在进行finetune模型压缩时,使用gpu,info信息到device,直接退出了,在使用cpu时,info到device时,会往下进行,但是很慢很慢,以下是相关信息
python .\finetune.py --train_path .\data\train.txt --dev_path .\data\dev.txt --output_dir .\checkpoint\model_best --learning_rate 1e-5 --per_device_eval_batch_size 8 --per_device_train_batch_size 8 --max_seq_len 512 --num_train_epochs 50 --model_name_or_path .\checkpoint\model_best --seed 1000 --logging_steps 10 --eval_steps 100 --save_steps 100 --device gpu --do_compress --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --save_total_limit 1 --strategy qat
[2022-11-07 16:29:59,313] [ INFO] - ============================================================ [2022-11-07 16:29:59,316] [ INFO] - Model Configuration Arguments [2022-11-07 16:29:59,317] [ INFO] - paddle commit id :083853cd4e4a9bdad22c70fa48eb9a036d2def27 [2022-11-07 16:29:59,318] [ INFO] - export_model_dir :None [2022-11-07 16:29:59,318] [ INFO] - model_name_or_path :.\checkpoint\model_best [2022-11-07 16:29:59,318] [ INFO] - multilingual :False [2022-11-07 16:29:59,318] [ INFO] - [2022-11-07 16:29:59,319] [ INFO] - ============================================================ [2022-11-07 16:29:59,319] [ INFO] - Data Configuration Arguments [2022-11-07 16:29:59,319] [ INFO] - paddle commit id :083853cd4e4a9bdad22c70fa48eb9a036d2def27 [2022-11-07 16:29:59,320] [ INFO] - dev_path :.\data\dev.txt [2022-11-07 16:29:59,320] [ INFO] - max_seq_length :512 [2022-11-07 16:29:59,320] [ INFO] - train_path :.\data\train.txt [2022-11-07 16:29:59,321] [ INFO] - [2022-11-07 16:29:59,321] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: False [2022-11-07 16:29:59,322] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load '.\checkpoint\model_best'. W1107 16:29:59.363667 20828 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 10.2 W1107 16:29:59.423164 20828 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6. (paddle_test) PS D:\信息抽取-PaddleNLP_UIE>
这是gpu下直接退出的情形
[2022-11-07 15:15:38,460] [ INFO] - Gradient Accumulation steps = 1 [2022-11-07 15:15:38,460] [ INFO] - Total optimization steps = 3860.0 [2022-11-07 15:15:38,460] [ INFO] - Total num train samples = 15440 [2022-11-07 15:19:04,664] [ INFO] - loss: 0.0056895, learning_rate: 1e-05, global_step: 10, interval_runtime: 205.5363, interval_samples_per_second: 0.019, interval_steps_per_second: 0.049, epoch: 0.0518 [2022-11-07 15:22:25,579] [ INFO] - loss: 0.00726472, learning_rate: 1e-05, global_step: 20, interval_runtime: 197.9083, interval_samples_per_second: 0.02, interval_steps_per_second: 0.051, epoch: 0.1036 [2022-11-07 15:25:52,342] [ INFO] - loss: 0.00368861, learning_rate: 1e-05, global_step: 30, interval_runtime: 206.7552, interval_samples_per_second: 0.019, interval_steps_per_second: 0.048, epoch: 0.1554
这是cpu下的情况,不知为何gpu下会直接退出?
你那边gpu进行微调训练是可以的吗@starryzwh
@yuwochangzai 更新下paddlepaddle版本到2.4.0rc0再试下呢
你那边gpu进行微调训练是可以的吗@starryzwh 微调是可以的,版本也已经更新到2.4.0rc0,刚在另一个issue里看到@wawltor大佬说是paddlenlp需要develop版本的,我整尝试试一下此方法
@yuwochangzai 更新下paddlepaddle版本到2.4.0rc0再试下呢
成功了,谢谢!
@starryzwh 你直接退出的问题怎么解决的,我也遇到了同样的问题