trl topics

llama-trl

180

Stars

23

Forks

Watchers

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

jasonvanf

adapter

chatgpt

gpt

gpt-4

llm_rlhf

27

Stars

2

Forks

Watchers

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

ssbuild

llm

llm-rlhf

lora

reward

notus

159

Stars

14

Forks

Watchers

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

argilla-io

alignment-handbook

dpo

fine-tuning

lm-alignment

Dutch-LLMs

33

Stars

1

Forks

33

Watchers

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

RobinSmits

alpaca

dpo

large-language-models

open-llama

Simple-Trl-Training

25

Stars

1

Forks

Watchers

基于DPO算法微调语言大模型，简单好上手。

sugarandgugu

dpo

llm

rlhf

simple

vlm-grpo

78

Stars

7

Forks

78

Watchers

An implementation of GRPO for Unsloth's VLMs training

GAD-cell

grpo

grpotrainer

huggingface

reinforcement-learning

simpleR1

30

Stars

2

Forks

30

Watchers

simpleR1: A Simple Framework for Training R1-like Models

yflyzhang

deepseek-r1

grpo

grpotrainer

ppo