composer icon indicating copy to clipboard operation
composer copied to clipboard

Support for Deepspeed stage-3

Open ananyahjha93 opened this issue 2 years ago • 5 comments

🚀 Feature Request

The documentation here states that stage-3 is not yet supported.

https://docs.mosaicml.com/en/v0.10.0/notes/distributed_training.html#deepspeed

I tried passing this config to the trainer and it seems to work:

deepspeed_config = {"zero_optimization": {"stage": 3, "stage3_gather_16bit_weights_on_model_save": True}}

Exactly what is missing in stage-3 support from the trainer? Is the saving of checkpoints not configured properly for saving from state_dict of stage-3?

Motivation

Stage-3 would allow for model parameter sharding using deepspeed and allow for much larger model training with composer.

ananyahjha93 avatar Oct 02 '22 11:10 ananyahjha93

Hi @ananyahjha93 , thanks for the interest! The Composer is not actively developing a new feature using deepspeed. Instead, it is basically in maintenance mode with deepspeed. Hence, we can't guarantee a bug free support of deepspeed stage-3 right now. However, if you are interested, you can try out Composer + FSDP and you can look at this doc for more information. We are also working on a full fledged FSDP doc with Composer. So stay tuned for the update. Thanks!

karan6181 avatar Oct 04 '22 00:10 karan6181

@karan6181 I wasn't able to make bloom work with fsdp properly. I think bloom was trained using megatron LM with deepspeed and hasn't been tested with fsdp. And it works out of the box with deepspeed stage-2 in composer. So for my current research I would prefer to stick to deepspeed and not delve into trying to make bloom work with fsdp!

ananyahjha93 avatar Oct 04 '22 21:10 ananyahjha93

@ananyahjha93 can you try upgrading deepspeed manually? Basically, install composer and then install deepspeed at the higher version (if an upgrade is necessary) to try stage 3? I believe the version Composer installs is out of date.

Currently, we don't have a great set of benchmarks with deepspeed, so we've been cautious to upgrade the package version Composer requires because we're not sure what it'll break without coverage tests. However, I think it should still work if you manually do it

mvpatel2000 avatar Oct 10 '22 20:10 mvpatel2000

@mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires stage3_gather_16bit_weights_on_model_save to be set to true since state_dict is sharded so I am not sure if we need to handle this checkpoint saving differently with composer.

ananyahjha93 avatar Oct 10 '22 20:10 ananyahjha93

@mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires stage3_gather_16bit_weights_on_model_save to be set to true since state_dict is sharded so I am not sure if we need to handle this checkpoint saving differently with composer.

Hm.... One option to do this is to build your own CheckpointSaver callback. You can copy paste the one we have in Composer and modify it to change how the checkpoint works. Then, instead of passing in any checkpointing args into Trainer, you can pass your modified callback for deepspeed stage-3

mvpatel2000 avatar Oct 10 '22 20:10 mvpatel2000

@ananyahjha93 note that we just updated Composer to support the most recent deepspeed release if this is still an issue

mvpatel2000 avatar Dec 06 '22 18:12 mvpatel2000

Closing for now since we've updated deepspeed

mvpatel2000 avatar Jun 22 '23 21:06 mvpatel2000