Thaumstrial

Results 9 comments of Thaumstrial

btw, it's better to random shuffle the dataset or models will be overfitting.

A general idea 1. build a reward-model (take the prompt and answer and output a reward value) based on flant5-xxl encoder with a fully connected feedforward neural network to convert...

@theblackcat102 Got it. When I finish the experiment, I will put the results here to consider whether more effort is needed.

@maw501 My experiment is over | Model with MLP| WebGPT Accuracy | | ----------- | --------- | | T5 - flan - small | 53.2%| | T5 - flan -...

@maw501 Do you have a better idea?

@maw501 Hi! 👏 I tried to replicate the reward-model base on the InstructGPT paper. I want the reward-model to decide which of two responses to the same question was better...

The results shown in the paper are pretty good. Are there any planned/ongoing code reproducing projects?

@sanagno No, just experiment t5-flan-encoder combined with the idea of rankgen as the reward model

@andrewm4894 OK, I'll put it under /docs