Thaumstrial comments

Results 9 comments of


                                            Thaumstrial

[FEATURE]: Support more dataset and custom special token for reward-model training

btw, it's better to random shuffle the dataset or models will be overfitting.

Train a reward model based on flant5-xxl

A general idea 1. build a reward-model (take the prompt and answer and output a reward value) based on flant5-xxl encoder with a fully connected feedforward neural network to convert...

Train a reward model based on flant5-xxl

@theblackcat102 Got it. When I finish the experiment, I will put the results here to consider whether more effort is needed.

Train a reward model based on flant5-xxl

@maw501 My experiment is over | Model with MLP| WebGPT Accuracy | | ----------- | --------- | | T5 - flan - small | 53.2%| | T5 - flan -...

Train a reward model based on flant5-xxl

@maw501 Do you have a better idea?

Train a reward model based on flant5-xxl

@maw501 Hi! 👏 I tried to replicate the reward-model base on the InstructGPT paper. I want the reward-model to decide which of two responses to the same question was better...

SparceGPT + nanoGPT

The results shown in the paper are pretty good. Are there any planned/ongoing code reproducing projects?

Create readme.md

@sanagno No, just experiment t5-flan-encoder combined with the idea of rankgen as the reward model

Create readme.md

@andrewm4894 OK, I'll put it under /docs