preference-learning topic
tournesol
Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
magical
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
SAN-NaviSTAR
This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refe...
reward-bench
RewardBench: the first evaluation tool for reward models.
metis
Python-based GUI to collect Feedback of Chemist in Molecules
ICSFSurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
prelude
Aligning LLM Agents by Learning Latent Preference from User Edits
dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
awesome-direct-preference-optimization
A Survey of Direct Preference Optimization (DPO)