Mihir Patel

Results 104 comments of Mihir Patel

Ah nevermind I just tested it because I felt bad. It works. woohoo.

@dskhudia @jacobfulano is this still relevant? if so, can we merge? CC: @nik-mosaic if we do want to do these changes in performance

> composer.utils.dist.initialize_dist is called twice in YAHP flow. is this function not idempotent? why does calling it more than once matter? (asking bc we'll hit this soon if people start...

@ananyahjha93 can you try upgrading deepspeed manually? Basically, install composer and then install deepspeed at the higher version (if an upgrade is necessary) to try stage 3? I believe the...

> @mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires `stage3_gather_16bit_weights_on_model_save` to be set...

If a model is specified on the meta device, Trainer will correctly initialize on gpu if specified

Closing because we don't plan on adding this