GLM 我基于10B模型做继续训练，loss只从11下降到5

我基于10B模型做继续训练，loss只从11下降到5后。一般来讲，最终loss收敛后是多少。我用了12w文本，其中文本长度平均在5000。训练参数： gpus=8 max length=1024 batchsize=8 梯度累计=2 lr=7e-6 总的iter=5000，约等于5个epochs

@jeffra @samyam @tjruwase @WrRan

Apr 16 '23 09:04 TccccD

我下载的10b-Chinese模型无法解压，报错，老哥你是怎么下载的？

Apr 17 '23 11:04 shuangt

我基于10B模型做继续训练，loss只从11下降到5后。一般来讲，最终loss收敛后是多少。我用了12w文本，其中文本长度平均在5000。训练参数： gpus=8 max length=1024 batchsize=8 梯度累计=2 lr=7e-6 总的iter=5000，约等于5个epochs

@jeffra @samyam @tjruwase @WrRan

没有在windows上用过

Apr 17 '23 11:04 TccccD

继续训练是如何做的？

Apr 18 '23 10:04 superhg

我基于10B模型做继续训练，loss只从11下降到5后。一般来讲，最终loss收敛后是多少。我用了12w文本，其中文本长度平均在5000。训练参数： gpus=8 max length=1024 batchsize=8 梯度累计=2 lr=7e-6 总的iter=5000，约等于5个epochs

@jeffra @samyam @tjruwase @WrRan

大佬，你预训练是怎么继续的？

May 31 '23 03:05 runzhi214

请问您对这个问题有答案了吗？loss一般到什么水平算作合格呢

Jul 21 '23 09:07 parkLGW

same question here, when fine-tuning GLM10B I got the loss curve below, but I'm not sure how to validate whether the loss is valid or rational.

Jul 25 '23 13:07 shmily326