PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Bug]: 训练后评估阶段报错

Open jqtian123 opened this issue 10 months ago • 7 comments

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 3.0.0rc1
- paddlenlp: 3.0.0b3

重复问题

  • [x] I have searched the existing issues

错误描述

自己构造的数据:
python doccano.py --negative_ratio 5 --doccano_file ./data/doccano_ext.jsonl --task_type ext --save_dir ./data --splits 0.8 0.1 0.1 --schema_lang en
训练命令:
python finetune.py      --device gpu     --logging_steps 10     --save_steps 100     --eval_steps 100     --seed 42     --model_name_or_path uie-m-large     --output_dir $finetuned_model     --train_path data/train.txt     --dev_path data/dev.txt      --max_seq_length 512      --per_device_eval_batch_size 8     --per_device_train_batch_size  8     --num_train_epochs 20     --learning_rate 1e-5     --label_names "start_positions" "end_positions"     --do_train     --do_eval     --do_export     --export_model_dir $finetuned_model     --overwrite_output_dir     --disable_tqdm True     --metric_for_best_model eval_f1     --load_best_model_at_end  True     --save_total_limit 1
报错:
[2025-02-20 06:44:11,305] [    INFO] - ***** Running Evaluation *****
[2025-02-20 06:44:11,305] [    INFO] -   Num examples = 170
[2025-02-20 06:44:11,305] [    INFO] -   Total prediction steps = 22
[2025-02-20 06:44:11,306] [    INFO] -   Pre device batch size = 8
[2025-02-20 06:44:11,306] [    INFO] -   Total Batch size = 8
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
label_ids contains 1
<class 'numpy.ndarray'>
(170, 512)
Traceback (most recent call last):
  File "/root/autodl-tmp/PaddleNLP/slm/model_zoo/uie/finetune.py", line 269, in <module>
    main()
  File "/root/autodl-tmp/PaddleNLP/slm/model_zoo/uie/finetune.py", line 200, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/root/miniconda3/envs/uie/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 872, in train
    return self._inner_training_loop(
  File "/root/miniconda3/envs/uie/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 1231, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, epoch, ignore_keys_for_eval, inputs=inputs)
  File "/root/miniconda3/envs/uie/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 1521, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/root/miniconda3/envs/uie/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 2989, in evaluate
    output = self.evaluation_loop(
  File "/root/miniconda3/envs/uie/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 3208, in evaluation_loop
    metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=batch_labels))
  File "/root/autodl-tmp/PaddleNLP/slm/model_zoo/uie/finetune.py", line 168, in compute_metrics
    start_ids, end_ids = p.label_ids
ValueError: too many values to unpack (expected 2)

稳定复现步骤 & 代码

p.label_ids解包发生错误

jqtian123 avatar Feb 19 '25 23:02 jqtian123

你好,请问您的数据集格式是?跟文档给出的格式是否一致、

ZHUI avatar Mar 17 '25 08:03 ZHUI

请问答主解决了吗,我也是这个问题,用的是给的样例数据

Yuuko-kurisu avatar Mar 28 '25 04:03 Yuuko-kurisu

请问答主解决了吗,我也是这个问题,用的是给的样例数据

+1

li5hu1in avatar Apr 02 '25 09:04 li5hu1in

我也遇到了这个问题了,请问有人解决了吗?

undefinedspacex avatar Apr 12 '25 08:04 undefinedspacex

环境:paddlepaddle-gpu==3.0.0; paddlenlp==3.0.0b4 trainer.py 3339行 batch_labels = all_labels[0] if isinstance(all_labels, (list, tuple)) else all_labels 改为 batch_labels = all_labels 之后可以正常训练了

li5hu1in avatar Apr 14 '25 02:04 li5hu1in

环境:paddlepaddle-gpu==3.0.0; paddlenlp==3.0.0b4 trainer.py 3339行 batch_labels = all_labels[0] if isinstance(all_labels, (list, tuple)) else all_labels 改为 batch_labels = all_labels 之后可以正常训练了

感谢,更改后我这也可以正常训练了

undefinedspacex avatar Apr 14 '25 03:04 undefinedspacex

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Jun 14 '25 00:06 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Jun 29 '25 00:06 github-actions[bot]