Mihir Patel comments

Results 104 comments of


                                            Mihir Patel

Remove c4 dataset

Ah nevermind I just tested it because I felt bad. It works. woohoo.

Update checks for Gated Linear Units Method

@dskhudia @jacobfulano is this still relevant? if so, can we merge? CC: @nik-mosaic if we do want to do these changes in performance

torch_xla multi-device support

> composer.utils.dist.initialize_dist is called twice in YAHP flow. is this function not idempotent? why does calling it more than once matter? (asking bc we'll hit this soon if people start...

Support for Deepspeed stage-3

@ananyahjha93 can you try upgrading deepspeed manually? Basically, install composer and then install deepspeed at the higher version (if an upgrade is necessary) to try stage 3? I believe the...

> @mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires `stage3_gather_16bit_weights_on_model_save` to be set...

Document difference between *Hparams and associated class

YAHP is gone :)

Embed the algorithm and model metadata directly in the docs

Closing since it's done :)

avoid double forward pass for train metrics

This is now implemented :)

Initialize Models on GPU

If a model is specified on the meta device, Trainer will correctly initialize on gpu if specified

Pass the model intializer into `Trainer.init`

Closing because we don't plan on adding this