RLHFlow
                                        Results
                                        3
                                        repositories owned by
                                        
                                
                                            RLHFlow
                                        
                                    Directional-Preference-Alignment
                            
                                45
                            
                            
                        
                        Stars
                    
                            
                                2
                            
                            
                        
                        Forks
                    Watchers
                    Directional Preference Alignment
RLHF-Reward-Modeling
                            
                                738
                            
                            
                        
                        Stars
                    
                            
                                62
                            
                            
                        
                        Forks
                    Watchers
                    Recipes to train reward model for RLHF.
Online-RLHF
                            
                                381
                            
                            
                        
                        Stars
                    
                            
                                44
                            
                            
                        
                        Forks
                    Watchers
                    A recipe for online RLHF and online iterative DPO.