preference-learning topic
tournesol
Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
magical
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
SAN-NaviSTAR
This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refe...
reward-bench
RewardBench: the first evaluation tool for reward models.
metis
Python-based GUI to collect Feedback of Chemist in Molecules
ICSFSurvey
A comprehensive survey on Internal Consistency and Self-Feedback in Large Language Models.
prelude
Aligning LLM Agents by Learning Latent Preference from User Edits
dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards