DeepSpeedExamples Run 1.6 billon demo is much slow than description on A100 GPU?

Run 1.6 billon demo is much slow than description on A100 GPU?

Open tcluoct opened this issue 2 years ago • 3 comments

I run 1.6 billion parameters demo, it cost 1:46:27 on first step and 2:12:56 on second step. it's much slower than below.

Actor: OPT-1.3B Reward: OPT-350M | 2900 Sec | 670 Sec | 1.2hr | 2.2hr

Is there any thing to tune other than run: python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu

Thanks

Apr 19 '23 13:04 tcluoct

HI @tcluoct which GPU are you using to run this example? The values we reported are using A6000 GPU.

Apr 19 '23 18:04 mrwyattii

I'm using A100 which have better performance than A6000.

Apr 20 '23 00:04 tcluoct

After the train, when i run the final model. It responds very weird.

Apr 20 '23 00:04 tcluoct

@tcluoct I ran step1 with A6000 using the latest version of DeepSpeed and DeepSpeedExamples, and it was much faster than the time you reported.

Can you update them to the latest version and measure the time for data loading/forward/backward/step?

May 08 '23 21:05 tohtana

Closing because we have no further information. Feel free to reopen if the problem still exists.

May 22 '23 21:05 tohtana

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Run 1.6 billon demo is much slow than description on A100 GPU?

DeepSpeedExamples
DeepSpeedExamples copied to clipboard