DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Wrong code for calculate score in step2 evaluation
In the applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py line 254-258
scores += outputs["chosen_mean_scores"].mean().float()
if step == 99: # For faster evaluation and debugging
break
acc = correct_predictions / total_predictions
scores = scores / (step + 1)
It seems we are trying to calculate the average accuracy. Although for the scores, it seems the calculation is wrong because we are modify the scores value in place with
scores += outputs["chosen_mean_scores"].mean().float()
scores /= scores(step + 1)
I think the scores = scores / (step + 1) should be placed outside the for loop, otherwise it will get smaller and smaller during step increase.