echoht comments

Results 11 comments of


                                            echoht

[Help] <请问p-tuning 和 p-tuning V2 仅仅只修改prefix_projection 参数>吗

源代码通过是否设置pre_seq_len参数，来决定是否打开ptuning v2。代码里是用的ptuning v2，在modeling_chatglm.py 852行开始。

[Help] <请问p-tuning 和 p-tuning V2 仅仅只修改prefix_projection 参数>吗

> @echoht 请问那如果使用 p-tuning 如何设置呢？我看paper里面 ptuning embedding 也有 soft-prompt 吗可以自己试试改代码，我这边没有尝试。

[Help] <title>p-tuning的prefix加在transformer哪些层？

> 看源码，实现在所有层了。源码位置在modeling_chatglm.py 136行，和ptuning v2，代码一致。 ![企业微信截图_16844866285937](https://github.com/THUDM/ChatGLM-6B/assets/48375360/819154ad-ee3e-4b71-899e-5a916e847cf8)

[Help] generate方法和chat方法的调用结果不一致

> 想得到具体结果的话，top_p和temperature都设为0.01， do_sample设为False，应该是差在do_sample这个参数上了，可以试试 do_sample 都设置为false了，topp和temp还有用吗？

finetuning with gradient_checkpointing=True on 30B model

how this works？ do you use deepspeed? can you share the deepspeed config?

finetuning with gradient_checkpointing=True on 30B model

what's the max_seq_len in your finetune stage? do you ever try 2048?

Zero-R source code

@min-xu-ai hi do you has any clue about zero-R

Training loss curve like stairs.

hi can you share the scale of training loss? The following is mine. The loss of the beginning is very big!! I use deepspeed and gradient checkpointing. ![image](https://github.com/tatsu-lab/stanford_alpaca/assets/48375360/4fe1ecef-3030-483a-8d33-cb2df888da0b)

Training loss curve like stairs.

I have the question about my beginning loss, which is about 8. This phenomenon is very strange. What are the factors that lead to this kind of big loss?