DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

"ds_train_bert_nvidia_data_bsz64k_seq128.sh" program stalls at the end of the first epoch

Open inspur-hsslab opened this issue 4 years ago • 2 comments

When I run "ds_train_bert_nvidia_data_bsz64k_seq128.sh". It stalls at the end of the first epoch.

image

inspur-hsslab avatar Aug 05 '21 08:08 inspur-hsslab

met the same problem

dancingpipi avatar Nov 24 '21 06:11 dancingpipi

Met the same problem with DeepSpeed v0.6.1.

haolin-nju avatar Apr 25 '22 11:04 haolin-nju