Step2 training get a negative score and accuray is below 60%
Hi~ While running step2 reward model training, I got a strange result after one epoch training: ***** Evaluating reward, Epoch 1/1 ***** chosen_last_scores (higher is better) : -9.388486862182617, acc (higher is better) : 0.5991161465644836.
I wonder what's wrong with my training script?
my training script is as below:
OUTPUT=$1 ZERO_STAGE=$2 if [ "$OUTPUT" == "" ]; then OUTPUT=./output fi if [ "$ZERO_STAGE" == "" ]; then ZERO_STAGE=0 fi mkdir -p $OUTPUT
export CUDA_VISIBLE_DEVICES=1
deepspeed --master_port 29501 --include localhost:1 main.py --model_name_or_path facebook/opt-350m
--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets openai/webgpt_comparisons stanfordnlp/SHP
--num_padding_at_beginning 1 --gradient_accumulation_steps 2 --zero_stage $ZERO_STAGE
--per_device_train_batch_size 8 --per_device_eval_batch_size 16 --num_train_epochs 1
--deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log
The training process is running on a 32G V100-Tesla GPU
Same here. I tried with --num_train_epochs 3.
Result:
***** Running training *****
***** Evaluating reward, Epoch 0/3 *****
chosen_last_scores (higher is better) : 2.172806739807129, acc (higher is better) : 0.4861111044883728
...
Epoch 1/3 with loss 0.6737638572528841
***** Evaluating reward, Epoch 1/3 *****
chosen_last_scores (higher is better) : -6.797516345977783, acc (higher is better) : 0.6073232293128967
...
Epoch 2/3 with loss 0.6125241196086786
***** Evaluating reward, Epoch 2/3 *****
chosen_last_scores (higher is better) : 1.1381808519363403, acc (higher is better) : 0.6123737096786499
...
Epoch 3/3 with loss 0.47712893953608415
***** Evaluating reward, Epoch 3/3 *****
chosen_last_scores (higher is better) : -0.8144770264625549, acc (higher is better) : 0.5934343338012695
command:
deepspeed --num_gpus 1 main.py \
--model_name_or_path facebook/opt-350m \
--num_padding_at_beginning 1 \
--gradient_accumulation_steps 2 \
--zero_stage 0 \
--data_path Dahoas/rm-static \
--data_split 2,4,4 \
--num_train_epochs 3 \
--learning_rate 5e-5 \
--deepspeed --output_dir $OUTPUT &> $OUTPUT/training.log
Environments
- Nvidia A6000 48G single gpu
- pytorch 2.0
- cuda11.7