DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

"ds_train_bert_nvidia_data_bsz64k_seq128.sh" program stalls at the end of the first epoch

Open inspur-hsslab opened this issue 2 years ago • 2 comments

When I run "ds_train_bert_nvidia_data_bsz64k_seq128.sh". It stalls at the end of the first epoch.

image

inspur-hsslab avatar Aug 05 '21 08:08 inspur-hsslab