MedicalGPT icon indicating copy to clipboard operation
MedicalGPT copied to clipboard

RLHH

Open yangliuIOC opened this issue 1 year ago • 3 comments

Describe the Question

Please provide a clear and concise description of what the question is.

Describe your attempts

  • [ ] I walked through the tutorials
  • [ ] I checked the documentation
  • [ ] I checked to make sure that this is not a duplicate question reward model 应该是基于人类排序算法,训练的 一个模型把,然后用这个模型去激励 SFT模型,我看到您的reward model,其实也是SFT 在垂直领域的一个model。那就失去了HF的意义了吧,相当于只有RL了,那为啥不直接拿reward model 当做最后的model呢?

yangliuIOC avatar Jun 16 '23 03:06 yangliuIOC