Olatunji Ruwase comments

Results 648 comments of


                                            Olatunji Ruwase

Bing BERT

@liuyq47 I can confirm that I do see occasional spikes as well with all-reduce latency with a similar setup. In my case, I used single DGX2 node, 16GPUs and saw...

Bing BERT

@liuyq47 Thanks for confirming that this issue shows with gradient accumulations. Now, I suspect it has to do with the nvidia dataset as I don't believe we have previously seen...

@dancingpipi, sorry I have not run this in a long time and don't have the datasets setup on my box. But, can you try /workspace/bert/data/128 /workspace/bert/data/512 The related configuration setting...

Dynamic batch support

The primary reason is to figure out the number of required gradient accumulation steps.

Error encountered running DeepSpeedExamples/HelloDeepSpeed/train_bert.py

@jeyblu, is this still a problem? Thanks!

HelloDeepSpeed not reproducible

@Zha0q1, thanks for reporting this issue. It is really strange. Do you observe the same behavior with single gpu run?

Fixed dataset bug in bing_bert.

@wenting-zhao, thanks for looking into this. Can you describe what difference you see with this fix? Is the training loss curve or the throughput improved?

Fixed dataset bug in bing_bert.

@wenting-zhao, for more context, our port for the nvidia dataset was based on this nvidia bert [code](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT) which used [RandomSampler](https://github.com/NVIDIA/DeepLearningExamples/blob/d788e8d4968e72c722c5148a50a7d4692f6e7bd3/PyTorch/LanguageModeling/BERT/run_pretraining.py#L84) because of how their dataset was organized. I have not...

Fixed dataset bug in bing_bert.

@wenting-zhao, thanks for your explanation. Your description is correct but only applies to the case where multiple GPUs are processing one hdf5 file. However when we did this port the...