Step2 training get a negative score and accuray is below 60%

Open dlnlpchenliyu opened this issue 2 years ago • 1 comments

Hi~ While running step2 reward model training, I got a strange result after one epoch training: ***** Evaluating reward, Epoch 1/1 ***** chosen_last_scores (higher is better) : -9.388486862182617, acc (higher is better) : 0.5991161465644836.

I wonder what's wrong with my training script?

my training script is as below:

OUTPUT=$1 ZERO_STAGE=$2 if [ "$OUTPUT" == "" ]; then OUTPUT=./output fi if [ "$ZERO_STAGE" == "" ]; then ZERO_STAGE=0 fi mkdir -p $OUTPUT

export CUDA_VISIBLE_DEVICES=1 deepspeed --master_port 29501 --include localhost:1 main.py --model_name_or_path facebook/opt-350m
--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets openai/webgpt_comparisons stanfordnlp/SHP
--num_padding_at_beginning 1 --gradient_accumulation_steps 2 --zero_stage $ZERO_STAGE
--per_device_train_batch_size 8 --per_device_eval_batch_size 16 --num_train_epochs 1
--deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log

The training process is running on a 32G V100-Tesla GPU

Apr 17 '23 03:04 dlnlpchenliyu

Same here. I tried with --num_train_epochs 3.

Result:

***** Running training *****
***** Evaluating reward, Epoch 0/3 *****
chosen_last_scores (higher is better) : 2.172806739807129, acc (higher is better) : 0.4861111044883728
...
Epoch 1/3 with loss 0.6737638572528841
***** Evaluating reward, Epoch 1/3 *****
chosen_last_scores (higher is better) : -6.797516345977783, acc (higher is better) : 0.6073232293128967
...
Epoch 2/3 with loss 0.6125241196086786
***** Evaluating reward, Epoch 2/3 *****
chosen_last_scores (higher is better) : 1.1381808519363403, acc (higher is better) : 0.6123737096786499
...
Epoch 3/3 with loss 0.47712893953608415
***** Evaluating reward, Epoch 3/3 *****
chosen_last_scores (higher is better) : -0.8144770264625549, acc (higher is better) : 0.5934343338012695

command:

deepspeed --num_gpus 1 main.py \
   --model_name_or_path facebook/opt-350m \
   --num_padding_at_beginning 1 \
   --gradient_accumulation_steps 2 \
   --zero_stage 0 \
   --data_path Dahoas/rm-static \
   --data_split 2,4,4 \
   --num_train_epochs 3 \
   --learning_rate 5e-5 \
   --deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log

Environments

Nvidia A6000 48G single gpu
pytorch 2.0
cuda11.7

Apr 21 '23 06:04 HyeongminMoon