DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Wrong code for calculate score in step2 evaluation

Open nepetune233 opened this issue 2 years ago • 0 comments

In the applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py line 254-258

scores += outputs["chosen_mean_scores"].mean().float()
if step == 99:  # For faster evaluation and debugging
    break
acc = correct_predictions / total_predictions
scores = scores / (step + 1)

It seems we are trying to calculate the average accuracy. Although for the scores, it seems the calculation is wrong because we are modify the scores value in place with

scores += outputs["chosen_mean_scores"].mean().float()
scores /= scores(step + 1)

I think the scores = scores / (step + 1) should be placed outside the for loop, otherwise it will get smaller and smaller during step increase.

nepetune233 avatar Apr 17 '23 12:04 nepetune233