[FEATURE] Add RLHF training

Open pascal-pfeiffer opened this issue 2 years ago • 0 comments

As a next step, we should add RLHF training to continue fine tuning.

This may include two steps.

Open questions: Data labeling? (The human Feedback)

Better models

Apr 29 '23 18:04 pascal-pfeiffer