Olatunji Ruwase

Results 648 comments of Olatunji Ruwase

@dancingpipi, apologies for the delayed response. Hope the answers below are still helpful. 1. ZeRO is designed to reduce the memory overheads of very large models, with billions of parameters....

@FatCockHu, can you please open a separate ticket for your error? Thanks!

@marchen00, the PLD implementation is split between the DeepSpeed engine and the client. In particular, DeepSpeed maintans the theta and gamma values [here](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/progressive_layer_drop.py), and with [this logic](https://github.com/microsoft/DeepSpeed/blob/9bf1e9af3a3a958fc74b5d5d57e56b72559f5458/deepspeed/runtime/engine.py#L1530-L1531) makes them available...

@jeyblu, apologies for the delayed response. Is this still a problem?

Can you please share the output of running `ds_report` in your shell?

Thanks for trying out DeepSpeed. Unfortunately, these datasets are not yet publicly available. We are working on resolving this. Apologies for the inconvenience.

@piyushghai We are pleased to announce that support for training Bing BERT with Nvidia dataset, #27. Please give it a try.

@sriramsrao, @oliverhu, @tomekrut We have added support for training with Nvidia dataset. Thanks for the patience. We would really appreciate feedback on your experience trying it out. Thanks!

@liuyq47 Thanks for trying out the new dataset. Can you be more specific on the timer names and values showing the spikes? The highlighted section of the screenshot seems fine...

Thanks for the clarification. So to confirm, you are observing occasional spikes of allreduce time from ~229 to ~415. Yes, that does look odd. To help repro for a quick...