safe-reinforcement-learning-from-human-feedback topic

List safe-reinforcement-learning-from-human-feedback repositories

safe-rlhf

1.3k
Stars
119
Forks
Watchers

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback