echoht

Results 11 comments of echoht

源代码通过是否设置pre_seq_len参数,来决定是否打开ptuning v2。 代码里是用的ptuning v2,在modeling_chatglm.py 852行开始。

> @echoht 请问那如果使用 p-tuning 如何设置呢? 我看paper里面 ptuning embedding 也有 soft-prompt 吗 可以自己试试改代码,我这边没有尝试。

> 看源码,实现在所有层了。 源码位置在modeling_chatglm.py 136行,和ptuning v2,代码一致。 ![企业微信截图_16844866285937](https://github.com/THUDM/ChatGLM-6B/assets/48375360/819154ad-ee3e-4b71-899e-5a916e847cf8)

> 想得到具体结果的话,top_p和temperature都设为0.01, do_sample设为False,应该是差在do_sample这个参数上了,可以试试 do_sample 都设置为false了,topp和temp还有用吗?

how this works? do you use deepspeed? can you share the deepspeed config?

what's the max_seq_len in your finetune stage? do you ever try 2048?

@min-xu-ai hi do you has any clue about zero-R

hi can you share the scale of training loss? The following is mine. The loss of the beginning is very big!! I use deepspeed and gradient checkpointing. ![image](https://github.com/tatsu-lab/stanford_alpaca/assets/48375360/4fe1ecef-3030-483a-8d33-cb2df888da0b)

I have the question about my beginning loss, which is about 8. This phenomenon is very strange. What are the factors that lead to this kind of big loss?