MOSS-RLHF Has anyone compared this training framework to TRL?

Has anyone compared this training framework to TRL?

Open StarrySeas1 opened this issue 1 year ago • 1 comments

TRL PPO implementation is simpler than this, and takes up less memory. This framework has an additional value contribution network. I don't know which framework is more stable and effective.

Mar 25 '24 12:03 StarrySeas1

While TRL indeed reduces one value function network, it may be relatively more challenging to train. That is because the policy and value function share parameters. On the other hand, the TRL library's code encapsulates a lot of optimizations, whereas our code has no additional optimization methods, making it easier to understand and modify.

Apr 28 '24 06:04 refrain-wbh

MOSS-RLHF MOSS-RLHF copied to clipboard

Has anyone compared this training framework to TRL?

MOSS-RLHF
MOSS-RLHF copied to clipboard