learning-from-human-feedback topic

List learning-from-human-feedback repositories

chain-of-hindsight

214
Stars
17
Forks
Watchers

Chain-of-Hindsight, A Scalable RLHF Method

exact-optimization

45
Stars
0
Forks
Watchers

ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment