echoht
echoht
源代码通过是否设置pre_seq_len参数,来决定是否打开ptuning v2。 代码里是用的ptuning v2,在modeling_chatglm.py 852行开始。
> @echoht 请问那如果使用 p-tuning 如何设置呢? 我看paper里面 ptuning embedding 也有 soft-prompt 吗 可以自己试试改代码,我这边没有尝试。
> 看源码,实现在所有层了。 源码位置在modeling_chatglm.py 136行,和ptuning v2,代码一致。 
> 想得到具体结果的话,top_p和temperature都设为0.01, do_sample设为False,应该是差在do_sample这个参数上了,可以试试 do_sample 都设置为false了,topp和temp还有用吗?
how this works? do you use deepspeed? can you share the deepspeed config?
what's the max_seq_len in your finetune stage? do you ever try 2048?
@min-xu-ai hi do you has any clue about zero-R
为什么性能下降有结果吗?
hi can you share the scale of training loss? The following is mine. The loss of the beginning is very big!! I use deepspeed and gradient checkpointing. 
I have the question about my beginning loss, which is about 8. This phenomenon is very strange. What are the factors that lead to this kind of big loss?