DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Run 1.6 billon demo is much slow than description on A100 GPU?
I run 1.6 billion parameters demo, it cost 1:46:27 on first step and 2:12:56 on second step. it's much slower than below.
Actor: OPT-1.3B Reward: OPT-350M | 2900 Sec | 670 Sec | 1.2hr | 2.2hr
Is there any thing to tune other than run: python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu
Thanks
HI @tcluoct which GPU are you using to run this example? The values we reported are using A6000 GPU.
I'm using A100 which have better performance than A6000.
After the train, when i run the final model.
It responds very weird.

@tcluoct I ran step1 with A6000 using the latest version of DeepSpeed and DeepSpeedExamples, and it was much faster than the time you reported.
Can you update them to the latest version and measure the time for data loading/forward/backward/step?
Closing because we have no further information. Feel free to reopen if the problem still exists.