Mihir Patel
Mihir Patel
Ah nevermind I just tested it because I felt bad. It works. woohoo.
@dskhudia @jacobfulano is this still relevant? if so, can we merge? CC: @nik-mosaic if we do want to do these changes in performance
> composer.utils.dist.initialize_dist is called twice in YAHP flow. is this function not idempotent? why does calling it more than once matter? (asking bc we'll hit this soon if people start...
@ananyahjha93 can you try upgrading deepspeed manually? Basically, install composer and then install deepspeed at the higher version (if an upgrade is necessary) to try stage 3? I believe the...
> @mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires `stage3_gather_16bit_weights_on_model_save` to be set...
YAHP is gone :)
Closing since it's done :)
This is now implemented :)
If a model is specified on the meta device, Trainer will correctly initialize on gpu if specified
Closing because we don't plan on adding this