h2o-llmstudio
h2o-llmstudio copied to clipboard
[FEATURE] Add RLHF training
🚀 Feature
As a next step, we should add RLHF training to continue fine tuning.
This may include two steps.
- Train a reward model
- RL using the reward model from 1.
Open questions: Data labeling? (The human Feedback)
Motivation
Better models