Olatunji Ruwase

Results 648 comments of Olatunji Ruwase

Awesome. Thanks!

@liuyq47 I can confirm that I do see occasional spikes as well with all-reduce latency with a similar setup. In my case, I used single DGX2 node, 16GPUs and saw...

@liuyq47 Thanks for confirming that this issue shows with gradient accumulations. Now, I suspect it has to do with the nvidia dataset as I don't believe we have previously seen...

@dancingpipi, sorry I have not run this in a long time and don't have the datasets setup on my box. But, can you try /workspace/bert/data/128 /workspace/bert/data/512 The related configuration setting...

The primary reason is to figure out the number of required gradient accumulation steps.

@Zha0q1, thanks for reporting this issue. It is really strange. Do you observe the same behavior with single gpu run?

@wenting-zhao, thanks for looking into this. Can you describe what difference you see with this fix? Is the training loss curve or the throughput improved?

@wenting-zhao, for more context, our port for the nvidia dataset was based on this nvidia bert [code](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT) which used [RandomSampler](https://github.com/NVIDIA/DeepLearningExamples/blob/d788e8d4968e72c722c5148a50a7d4692f6e7bd3/PyTorch/LanguageModeling/BERT/run_pretraining.py#L84) because of how their dataset was organized. I have not...

@wenting-zhao, thanks for your explanation. Your description is correct but only applies to the case where multiple GPUs are processing one hdf5 file. However when we did this port the...