alpaca-rlhf icon indicating copy to clipboard operation
alpaca-rlhf copied to clipboard

stop at step2 evaluation_reward

Open murphypei opened this issue 2 years ago • 4 comments

Firstly, thank you for your contributions. I consistently pause (but do not exit) at the evaluation_reward during the training of step 2. Hence, I am wondering if there is something wrong. Perhaps the condition args.global_rank == 0 is unnecessary? Any suggestions would be greatly appreciated. Thank you.

murphypei avatar May 23 '23 01:05 murphypei

The following code works well.

if (step + 1) % 100 == 0:
    reward_score, rejected_scores, acc, score_std = evaluation_reward(rm_model, eval_dataloader)
    if args.global_rank == 0:
        wandb.log({
            'Eval/epoch': -1,
            'Eval/reward_score': reward_score,
            'Eval/score_std': score_std,
            'Eval/rejected_scores': rejected_scores,
            'Eval/acc': acc,
        })

murphypei avatar May 23 '23 02:05 murphypei

The following code works well.

if (step + 1) % 100 == 0:
    reward_score, rejected_scores, acc, score_std = evaluation_reward(rm_model, eval_dataloader)
    if args.global_rank == 0:
        wandb.log({
            'Eval/epoch': -1,
            'Eval/reward_score': reward_score,
            'Eval/score_std': score_std,
            'Eval/rejected_scores': rejected_scores,
            'Eval/acc': acc,
        })

You are right. The condition args.global_rank == 0 has to be removed, since the evaluation_reward method needs all processes to participate.

l294265421 avatar May 23 '23 02:05 l294265421

In addition, there are another bug. The rm_model.train() should be put in the step loop: 图片

l294265421 avatar May 23 '23 02:05 l294265421

In addition, there are another bug. The rm_model.train() should be put in the step loop: 图片

OK, thanks for your reply.

murphypei avatar May 23 '23 06:05 murphypei