Baichuan2 z_loss_weight 默认是0，给出的finetune示例也是0.所以实际没有用到z loss吗？

z_loss_weight 默认是0，给出的finetune示例也是0.所以实际没有用到z loss吗？

Open uygnef opened this issue 1 year ago • 8 comments

Sep 20 '23 07:09 uygnef

z-loss was adopted in our training. But it is not necessary so we turn it off in the opensource code.

Sep 20 '23 11:09 mmmans

z-loss was adopted in our training. But it is necessary so we turn it off in the opensource code.

hi @mmmans , do you mean it's unnecessary at finetune stage?

Sep 20 '23 11:09 uygnef

z-loss was adopted in our training. But it is necessary so we turn it off in the opensource code.

hi @mmmans , do you mean it's unnecessary at finetune stage?

Not necessary. depends on your setting actually.

Sep 20 '23 11:09 mmmans

oh, I see. thx a lot

Sep 20 '23 11:09 uygnef

@mmmans I have added thousands of new tokens and made finetuning of full parameters. Do I need to set z_loss_weight?

Dec 30 '23 01:12 felixfuu

@mmmans I have added thousands of new tokens and made finetuning of full parameters. Do I need to set z_loss_weight?

depends on your own setting actually. if your training does not exhibit the training instability problem, there is no need to set z_loss

Dec 30 '23 01:12 mmmans

@mmmans thx~

Dec 30 '23 02:12 felixfuu

@mmmans loss = 6.x does not converge,should set z_loss_weight?

Dec 30 '23 03:12 felixfuu

Baichuan2 Baichuan2 copied to clipboard

z_loss_weight 默认是0，给出的finetune示例也是0.所以实际没有用到z loss吗？

Baichuan2
Baichuan2 copied to clipboard