WebGLM
WebGLM copied to clipboard

Published 20 hours ago •

Reame
Issues

关于人类偏好模型的训练

Open dongxqm opened this issue 1 year ago • 2 comments

您好，看到论文里写的最后的对比训练用的是，一个线性层做的一个打分排序模型？请问这一步是不是没有用的强化学习

Jul 19 '23 05:07 dongxqm

是的，我们目前还没有使用强化学习用于我们的模型训练中，人类偏好模型目前仅用于模型回答的筛选。

Jul 24 '23 07:07 hanyullai

好的，感谢您的回答

Jul 24 '23 09:07 xqun3