PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Question]: 模型微调后怎么使用

Open hehuang139 opened this issue 3 years ago • 3 comments

请问 模型ernie_layout,做doc_vqa的模型微调,训练完成后(由于整体的训练时间太长,我就训练了1epoch,看下效果,并且现在是40000的checkpoint),我要怎么用这个微调的模型就行预测?

  1. 文档中没有告诉我怎么直接通过微调的模型进行预测,即指定checkpoint进行预测,我只看到do_train的时候可以从checkpoint出重新开始训练,但是do_predict的时候,并不会使用output_dir/checkpoint的模型?如果我指定--do_predict要用什么参数进行预测,需要怎么改,帮忙提供下
  2. 是否一定要将动态图转换成静态部署之后才能预测?正常应该不需要吧,但是没有看到文档,是否可以提供下文档说明
  3. 同时我也尝试export_model,但是报错 这是我的output_dir,是否一定要训练完成才能export,才能预测?

├── ernie-layoutx-base-uncased │   └── models │   └── docvqa_zh │   ├── checkpoint-30000 │   │   ├── model_config.json │   │   ├── model_state.pdparams │   │   ├── optimizer.pdopt │   │   ├── rng_state.pth │   │   ├── scheduler.pdparams │   │   ├── sentencepiece.bpe.model │   │   ├── special_tokens_map.json │   │   ├── tokenizer_config.json │   │   ├── trainer_state.json │   │   ├── training_args.bin │   │   └── vocab.txt │   ├── checkpoint-40000 │   │   ├── model_config.json │   │   ├── model_state.pdparams │   │   ├── optimizer.pdopt │   │   ├── rng_state.pth │   │   ├── scheduler.pdparams │   │   ├── sentencepiece.bpe.model │   │   ├── special_tokens_map.json │   │   ├── tokenizer_config.json │   │   ├── trainer_state.json │   │   ├── training_args.bin │   │   └── vocab.txt │   ├── eval_golden_labels.json │   ├── eval_predictions.json │   └── runs │  

/usr/bin/env /home/hehuang/dev/env/anaconda3/envs/paddlenlp-gpu/bin/python /home/hehuang/.vscode/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/launcher 38467 -- /home/hehuang/dev/git/PaddleNLP/model_zoo/ernie-layout/export_model.py --model_path ./ernie-layoutx-base-uncased/models/docvqa_zh/ --task_type mrc --output_path ./mrc_export [2022-10-27 15:49:56,644] [ INFO] - Downloading model_config.json from https://bj.bcebos.com/paddlenlp/models/community/./ernie-layoutx-base-uncased/models/docvqa_zh/model_config.json [2022-10-27 15:49:56,871] [ ERROR] - Downloading from https://bj.bcebos.com/paddlenlp/models/community/./ernie-layoutx-base-uncased/models/docvqa_zh/model_config.json failed with code 404! Traceback (most recent call last): File "/home/hehuang/dev/git/PaddleNLP/paddlenlp/transformers/auto/modeling.py", line 278, in _from_pretrained resolved_vocab_file = get_path_from_url(community_config_path, File "/home/hehuang/dev/git/PaddleNLP/paddlenlp/utils/downloader.py", line 164, in get_path_from_url fullpath = _download(url, root_dir, md5sum) File "/home/hehuang/dev/git/PaddleNLP/paddlenlp/utils/downloader.py", line 200, in _download raise RuntimeError("Downloading from {} failed with code " RuntimeError: Downloading from https://bj.bcebos.com/paddlenlp/models/community/./ernie-layoutx-base-uncased/models/docvqa_zh/model_config.json failed with code 404!

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/hehuang/dev/env/anaconda3/envs/paddlenlp-gpu/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/hehuang/dev/env/anaconda3/envs/paddlenlp-gpu/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/hehuang/.vscode/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/main.py", line 45, in cli.main() File "/home/hehuang/.vscode/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File "/home/hehuang/.vscode/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("main")) File "/home/hehuang/dev/env/anaconda3/envs/paddlenlp-gpu/lib/python3.9/runpy.py", line 288, in run_path return _run_module_code(code, init_globals, run_name, File "/home/hehuang/dev/env/anaconda3/envs/paddlenlp-gpu/lib/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/hehuang/dev/env/anaconda3/envs/paddlenlp-gpu/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/hehuang/dev/git/PaddleNLP/model_zoo/ernie-layout/export_model.py", line 33, in model = AutoModelForQuestionAnswering.from_pretrained(args.model_path) File "/home/hehuang/dev/git/PaddleNLP/paddlenlp/transformers/auto/modeling.py", line 597, in from_pretrained return cls._from_pretrained(pretrained_model_name_or_path, *model_args, File "/home/hehuang/dev/git/PaddleNLP/paddlenlp/transformers/auto/modeling.py", line 282, in _from_pretrained raise RuntimeError( RuntimeError: Can't load weights for './ernie-layoutx-base-uncased/models/docvqa_zh/'. Please make sure that './ernie-layoutx-base-uncased/models/docvqa_zh/' is:

  • a correct model-identifier of built-in pretrained models,
  • or a correct model-identifier of community-contributed pretrained models,
  • or the correct path to a directory containing relevant modeling files(model_weights and model_config)

hehuang139 avatar Oct 27 '22 07:10 hehuang139

看了下源码,发现每个save_steps会先保存到checkpoint中,checkpoint也是存储模型,也是通过model.save方式保存的,只有最后完成训练后,比较最优才保存到output_dir中。即只有训练完成才能导出模型。

如果想要用checkpoint的模型也是可以的,只要制定下本地模型路径到checkpoint的路径即可

也就是如果想要使用checkpoint的微调模型,参数是要修改的model_name_or_path指向本地的模型checkpoint路径

python3 -u run_mrc.py
--model_name_or_path ./ernie-layoutx-base-uncased/models/docvqa_zh/checkpoint-40000
--output_dir ./predict_result
--dataset_name docvqa_zh
--do_predict
--lang "ch"
--num_train_epochs 6
--lr_scheduler_type linear
--warmup_ratio 0.05
--weight_decay 0
--pattern "mrc"
--use_segment_box false
--return_entity_level_metrics false
--overwrite_cache false
--doc_stride 128
--target_size 1000
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--learning_rate 2e-5
--preprocessing_num_workers 32
--save_total_limit 1
--train_nshard 16
--seed 1000
--metric_for_best_model anls
--greater_is_better true

hehuang139 avatar Oct 27 '22 08:10 hehuang139

通过checkpoint导出也是可以的,就是将model_path指向checkpoint的路径即可

hehuang139 avatar Oct 27 '22 08:10 hehuang139

请问docvqa_zh这个数据集在哪里下载呢

chaiyixuan avatar Nov 10 '22 07:11 chaiyixuan

请问docvqa_zh这个数据集在哪里下载呢

跑阅读理解run_mrc.py就会自动下载,这个机制是paddle的datasets机制的_split_generators(其实和huggingface的datasets一摸一样的,如果你对hf有了解的话),具体数据集代码可以参考paddlenlp/datasets/hf_datasets/docvqa_zh.py这个模块,在这个模块中,你可以看到_split_generators拆分数据集,可以看到通过download_manager 从 _URL = "https://bj.bcebos.com/paddlenlp/datasets/docvqa_zh.tar.gz"。我们没必要自己下载,当然也可以自己下载放到.cache/huggingface**目录下,但是这个下载文件是有hash编码的,我们直接放进去没用

hehuang139 avatar Nov 11 '22 14:11 hehuang139

请问docvqa_zh这个数据集在哪里下载呢

百度下载速度也很快,不用担心

hehuang139 avatar Nov 11 '22 14:11 hehuang139