DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

DeepSpeed-Chat Step-3 tensorboard loss figures with multiple training epochs

Open GeekDream-x opened this issue 2 years ago • 0 comments

Hi, when training RLHF step-3, I set parameters related to epochs as:

  • ppo_epochs = 1
  • num_train_epochs = 30

and I found that the numbers of lines in "actor_loss", "actor_loss_sum", "critic_loss", "critic_loss_sum" and "reward" (saved in the folder "step3_tensorboard_logs") are the same as the parameter "num_train_epochs", like :

Screenshot 2023-11-16 at 10 03 43 Screenshot 2023-11-16 at 10 04 05

The question is:

How can I make them display as the figure for "train_loss" or "lr" (saved in the folder "ds_tensorboard_logs") like the below one, where the line for the whole 30 epochs as a whole ( a single line):

Screenshot 2023-11-16 at 09 57 49 Screenshot 2023-11-16 at 09 58 20

Thanks!

GeekDream-x avatar Nov 16 '23 03:11 GeekDream-x