RLHFlow

Results 3 repositories owned by RLHFlow

RLHF-Reward-Modeling

1.5k
Stars
103
Forks
1.5k
Watchers

Recipes to train reward model for RLHF.

Online-RLHF

536
Stars
49
Forks
536
Watchers

A recipe for online RLHF and online iterative DPO.