RLHFlow
Results
3
repositories owned by
RLHFlow
Directional-Preference-Alignment
45
Stars
2
Forks
Watchers
Directional Preference Alignment
RLHF-Reward-Modeling
1.5k
Stars
103
Forks
1.5k
Watchers
Recipes to train reward model for RLHF.
Online-RLHF
536
Stars
49
Forks
536
Watchers
A recipe for online RLHF and online iterative DPO.