PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

关于transformer机器翻译的静态模型训练问题

Open dlkht opened this issue 3 years ago • 2 comments

你好!

机器翻译examples/machine_translator/transformer下有动态和静态static两种训练程序, 静态的train.py没有init_from_checkpoint 相关代码,我先训练出了动态模型,然后想用 静态训练的train.py在动态训练断点基础上训练,然后我在transformer/static/train.py 加入以下代码(按照动态train.py的写法): if args.init_from_checkpoint: paddle.disable_static() model_dict = paddle.load( os.path.join(args.init_from_checkpoint, "transformer.pdparams")) #opt_dict = paddle.load( # os.path.join(args.init_from_checkpoint, "transformer.pdopt")) transformer.set_state_dict(model_dict) #optimizer_0.set_state_dict(opt_dict) print("loaded from checkpoint.") paddle.enable_static() 发现貌似能接着动态训练断点训练, 1、请问这个写法正确吗?最正规的方法是什么? 2、另外这种写法optimizer_0.set_state_dict(opt_dict)却不通过,为什么?

dlkht avatar Aug 31 '22 08:08 dlkht

静态图不能直接 load 动态图的参数,如果一定需要这样,可以先参考这里的写法试试 https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/ops/faster_transformer/transformer/faster_transformer.py#L436

另外想问下,先用动态图训,为什么需要再切换到静态图训呢?

FrostML avatar Sep 01 '22 05:09 FrostML

好的,谢谢! 后改用静态图训练,是因为想用paddleslim压缩下模型,但paddleslim似乎不怎么支持动态模型

dlkht avatar Sep 01 '22 05:09 dlkht

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 07 '22 07:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Dec 21 '22 11:12 github-actions[bot]