Thaumstrial
Thaumstrial
Use [flant5-xxl](https://huggingface.co/docs/transformers/main/en/model_doc/t5) encoder to train reward-model based on human-feedback dataset [conala](https://conala-corpus.github.io/). - code will be placed in model/reward/flant5-xxl - contrast this model with other reward models Current reward models just...
t5-flan-encoder experiment result
### Describe the feature The following datasets were not found to be supported in the [readme ](https://github.com/hpcaitech/ColossalAI/tree/main/applications/ChatGPT/examples)for training the reward model [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback) [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons) [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses) Are there any plans for that?...