rghosh08
Results
2
issues of
rghosh08
## Description This PR addresses: [Feature Request] multi-turn reward for RLHF #2271 This PR implements the reward system for multi-turn reinforcement learning from human feedback (RLHF), following the guidelines outlined...
enhancement
CLA Signed
It will be good to have llama-guard on the model list. Thanks