h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[FEATURE] Add RLHF training

Open pascal-pfeiffer opened this issue 2 years ago • 0 comments

🚀 Feature

As a next step, we should add RLHF training to continue fine tuning.

This may include two steps.

  1. Train a reward model
  2. RL using the reward model from 1.

Open questions: Data labeling? (The human Feedback)

Motivation

Better models

pascal-pfeiffer avatar Apr 29 '23 18:04 pascal-pfeiffer