Thaumstrial issues

Results 3 issues of


                                            Thaumstrial

Train a reward model based on flant5-xxl

Use [flant5-xxl](https://huggingface.co/docs/transformers/main/en/model_doc/t5) encoder to train reward-model based on human-feedback dataset [conala](https://conala-corpus.github.io/). - code will be placed in model/reward/flant5-xxl - contrast this model with other reward models Current reward models just...

data

Create readme.md

t5-flan-encoder experiment result

[FEATURE]: Support more dataset and custom special token for reward-model training

### Describe the feature The following datasets were not found to be supported in the [readme ](https://github.com/hpcaitech/ColossalAI/tree/main/applications/ChatGPT/examples)for training the reward model [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback) [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons) [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses) Are there any plans for that?...

enhancement