OpenRLHF adding length penalty to reward

adding length penalty to reward

Open karthik-nexusflow opened this issue 4 months ago • 1 comments

Hi Team, While using the PPO pipeline we observe at times spikes in response length and were curious if any techniques related to length penalty is available or explored

Mar 06 '24 20:03 karthik-nexusflow

OpenRLHF OpenRLHF copied to clipboard

adding length penalty to reward

OpenRLHF
OpenRLHF copied to clipboard