DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Reproduction Failure

Open JingerAI opened this issue 2 years ago • 2 comments

Attempting to reproduce the effect of 1.6b step1 SFT using the default single_node script configuration resulted in slow training on 4 V100 32G GPUs. It took 6 hours to complete, which is much slower than expected.

JingerAI avatar Apr 19 '23 15:04 JingerAI

Hi @JingerAI - I'm a little confused. There is on 1.6b model. Did you intend to write 6.7b or 1.3b?

mrwyattii avatar Apr 19 '23 17:04 mrwyattii

I think @JingerAI want to say 1.3B, i also trained the demo 1.3B model slow. Maybe there are some missing setting issue.

tcluoct avatar Apr 20 '23 00:04 tcluoct