llm_rlhf
llm_rlhf copied to clipboard

Published 20 hours ago •

Reame
Issues

原理请教

Open magnificent1208 opened this issue 2 years ago • 0 comments

rewardmodel是打分模型，可否用人工代替？若人工代替，则只需要组建三元组和对应分数，即可用强化学习的思路训练模型对吗？

Sep 14 '23 14:09 magnificent1208