DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Fix RLHF loss metrics & single-gpu training script
This PR fixes:
- the actor/critic mean loss calculation
- step-3 training script for 1.3b model on single gpu
- some typos
@microsoft-github-policy-service agree