MedicalGPT RLHH

RLHH

Open yangliuIOC opened this issue 1 year ago • 3 comments

Please provide a clear and concise description of what the question is.

[ ] I walked through the tutorials
[ ] I checked the documentation
[ ] I checked to make sure that this is not a duplicate question reward model 应该是基于人类排序算法，训练的一个模型把，然后用这个模型去激励 SFT模型，我看到您的reward model，其实也是SFT 在垂直领域的一个model。那就失去了HF的意义了吧，相当于只有RL了，那为啥不直接拿reward model 当做最后的model呢？

Jun 16 '23 03:06 yangliuIOC

reward model 是个排序模型，是个打分模型，你可以看下代码实现，loss是用kl散度计算的。

另外RLHH是啥？

Jun 16 '23 05:06 shibing624

我的意思是，reward model 是hf训练出来的，而不是sft 训练出来的。

Jun 16 '23 08:06 yangliuIOC

reward model 可以用sft后的model开始训练，也可以用robert等其他预训练模型开始训练。没写过rm是sft训练出来的。

Jun 16 '23 09:06 shibing624