LLM-Tuning icon indicating copy to clipboard operation
LLM-Tuning copied to clipboard

为什么ppo model 需要接AutoModelForCausalLMWithValueHead呢?

Open jiahuanluo opened this issue 1 year ago • 1 comments

感谢工作! 请问这里 ppo model 为什么要接一个valuehead 呢? https://github.com/beyondguo/LLM-Tuning/blob/ed68123815bc0add9ad2d7ddc2a48dc584db2c94/RLHF/rl_training.py#L185C1-L185C11 这个head好像随机初始化的?

jiahuanluo avatar Aug 29 '23 09:08 jiahuanluo

因为还有一个cirtic model

nghuyong avatar Oct 10 '23 08:10 nghuyong