ColossalAI [FEATURE]: Support more dataset and custom special token for reward-model training

Describe the feature

The following datasets were not found to be supported in the readme for training the reward model

openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses

Are there any plans for that? I have implemented these locally, whether the project needs this code or not?

And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.

Mar 10 '23 07:03 thaumstrial

btw, it's better to random shuffle the dataset or models will be overfitting.

Mar 11 '23 05:03 thaumstrial

Thanks for your feedback.We have changed our rm training code a lot last week, and it will be release soon in this week.

Mar 13 '23 01:03 ht-zhou

btw, it's better to random shuffle the dataset or models will be overfitting.

The rm finetuning on these datasets is set to 1 epoch refer to rlhf papers. Anyway, shuffling dataset is necessary, and we will fix it in our coming pr.

Mar 13 '23 01:03 ht-zhou

Describe the feature

The following datasets were not found to be supported in the readme for training the reward model

openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses

Are there any plans for that? I have implemented these locally, whether the project needs this code or not?

And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.

And also we will support these datasets soon.

Mar 13 '23 01:03 ht-zhou

@thaumstrial Could you share your code?，thank you very much，the reward dataset seems not fit task excluding chat

Apr 13 '23 11:04 zhohuiluo

We will further update it recently, welcome to stay tuned!

May 05 '23 04:05 binmakeswell