Open-Assistant
Open-Assistant copied to clipboard
Train a reward model based on Instructor
Add a scalar last-token reward-head to Instructor and train it on human-feedback pairs (good-bad) of the openai/summarize-from-feedback dataset (see the Learning to summarize from human feedback paper for details about the objective).
- place your training code in a new
model/reward/instructorfolder - please use wandb for experiment tracking, measure at least loss + accuracy (based on score of good example > bad example)
- try to avoid modifying the original model, if possible aggregate the existing model (i.e. add the existing model as a member of the new model class)
- compare with results from #78
Background: We want to implement the RLHF stack for Open-Assistant in parallel to our data collection effort. As a temporary fill-in we use existing RLHF datasets like OpenAI's learning to summarize for model development. Instructor was proposed as a promising base-model candidate for a reward model.
You could use bits of reward model training code that I trained a couple of weeks ago which contains data loading code for the summarize-from-feedback data as inspirations. If you like, you can of course use of a framework like pytorch_lightning.
I had one here, would be glad to contribute
I had one here, would be glad to contribute
Nice, I assigned the issue to you.
Since you already trained a RM based on bigscience/bloomz-560m .. do you think you could add loading code (e.g. see my linked code above) for the OpenAI summaries and train it on them? That would give us a datapoint for another RM. Did you record training metrics with wandb? Could you make it public?
Sure! I will train a few variety of models and push to huggingface (if thats fine). I already had webgpt rm model on huggingface
We currently have webgpt, antrophic, summarization, xp3 and unnatural instructions as possible datasets for reward model evaluation. I suggest that we get some data for summarization first on all models. If you ore someone else has time to test the other datasets that would be great too.
I will train a few variety of models and push to huggingface (if thats fine).
Of course! Could you submit a PR with your training code so that we have it in the repo?