PaddleNLP 跑GPT模型时训练参数如何设置

地址：https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/gpt 文档训练参数： CUDA_VISIBLE_DEVICES=0 python run_pretrain.py
--model_type gpt
--model_name_or_path gpt2-en
--input_dir "./data"
--output_dir "output"
--weight_decay 0.01
--grad_clip 1.0
--max_steps 500000
--save_steps 100000
--decay_steps 320000
--warmup_rate 0.01
--micro_batch_size 4
--device gpu

大佬们知道这些参数都是怎么算的吗比如数据是1万条这些参数如何自定义有相关文档吗

Jul 26 '22 03:07 syy-love

算token数目。比如这里的配置 max_steps * seq_len * batch_size = total_tokens 500000 * 1024* 4=2B 约20亿的token。假设1w条数据的平均长度是 256，则语料token数为 "10000 * 256 = 256w"

除一下，500000 * 1024* 4 / (10000 * 256 ) = 800, 相当于跑了 800个epoch，这个训练量确实算很大，可以设置 5w step试试，只跑80个epoch

Jul 26 '22 12:07 ZHUI

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

Dec 08 '22 06:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。

Dec 22 '22 16:12 github-actions[bot]

跑GPT模型时 训练参数如何设置

跑GPT模型时训练参数如何设置