Baichuan2
Baichuan2 copied to clipboard
z_loss_weight 默认是0,给出的finetune示例也是0.所以实际没有用到z loss吗?
z-loss was adopted in our training. But it is not necessary so we turn it off in the opensource code.
z-loss was adopted in our training. But it is necessary so we turn it off in the opensource code.
hi @mmmans , do you mean it's unnecessary at finetune stage?
z-loss was adopted in our training. But it is necessary so we turn it off in the opensource code.
hi @mmmans , do you mean it's unnecessary at finetune stage?
Not necessary. depends on your setting actually.
oh, I see. thx a lot
@mmmans I have added thousands of new tokens and made finetuning of full parameters. Do I need to set z_loss_weight?
@mmmans I have added thousands of new tokens and made finetuning of full parameters. Do I need to set z_loss_weight?
depends on your own setting actually. if your training does not exhibit the training instability problem, there is no need to set z_loss
@mmmans thx~
@mmmans loss = 6.x does not converge,should set z_loss_weight?