ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[FEATURE]: Support more dataset and custom special token for reward-model training

Open thaumstrial opened this issue 2 years ago • 4 comments

Describe the feature

The following datasets were not found to be supported in the readme for training the reward model

openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses

Are there any plans for that? I have implemented these locally, whether the project needs this code or not?

And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.

thaumstrial avatar Mar 10 '23 07:03 thaumstrial

btw, it's better to random shuffle the dataset or models will be overfitting.

thaumstrial avatar Mar 11 '23 05:03 thaumstrial

Thanks for your feedback.We have changed our rm training code a lot last week, and it will be release soon in this week.

ht-zhou avatar Mar 13 '23 01:03 ht-zhou

btw, it's better to random shuffle the dataset or models will be overfitting.

The rm finetuning on these datasets is set to 1 epoch refer to rlhf papers. Anyway, shuffling dataset is necessary, and we will fix it in our coming pr.

ht-zhou avatar Mar 13 '23 01:03 ht-zhou

Describe the feature

The following datasets were not found to be supported in the readme for training the reward model

openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses

Are there any plans for that? I have implemented these locally, whether the project needs this code or not?

And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.

And also we will support these datasets soon.

ht-zhou avatar Mar 13 '23 01:03 ht-zhou

@thaumstrial Could you share your code?,thank you very much,the reward dataset seems not fit task excluding chat

zhohuiluo avatar Apr 13 '23 11:04 zhohuiluo

We will further update it recently, welcome to stay tuned!

binmakeswell avatar May 05 '23 04:05 binmakeswell