rghosh08 issues

Repositories
Issues
Comments

Results 2 issues of


                                            rghosh08

[Examples] boiler plate code for multi-turn reward for RLHF

## Description This PR addresses: [Feature Request] multi-turn reward for RLHF #2271 This PR implements the reward system for multi-turn reinforcement learning from human feedback (RLHF), following the guidelines outlined...

enhancement

CLA Signed

can you add support for LlamaGuard (meta-llama/Llama-Guard-3-8B)?

It will be good to have llama-guard on the model list. Thanks