DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Reproduction Failure
Attempting to reproduce the effect of 1.6b step1 SFT using the default single_node script configuration resulted in slow training on 4 V100 32G GPUs. It took 6 hours to complete, which is much slower than expected.
Hi @JingerAI - I'm a little confused. There is on 1.6b model. Did you intend to write 6.7b or 1.3b?
I think @JingerAI want to say 1.3B, i also trained the demo 1.3B model slow. Maybe there are some missing setting issue.