reinforcement-learning-from-human-feedback topics

annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation

clam004

deep-learning

deep-reinforcement-learning

fine-tuning

language-model

OpenRLHF

1.3k

Stars

123

Forks

Watchers

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

OpenLLMAI

deepspeed

llm

ray

rlhf

llm_optimization

21

Stars

0

Forks

Watchers

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

large-language-models