composer
composer copied to clipboard
Support for Deepspeed stage-3
🚀 Feature Request
The documentation here states that stage-3 is not yet supported.
https://docs.mosaicml.com/en/v0.10.0/notes/distributed_training.html#deepspeed
I tried passing this config to the trainer and it seems to work:
deepspeed_config = {"zero_optimization": {"stage": 3, "stage3_gather_16bit_weights_on_model_save": True}}
Exactly what is missing in stage-3 support from the trainer? Is the saving of checkpoints not configured properly for saving from state_dict
of stage-3?
Motivation
Stage-3 would allow for model parameter sharding using deepspeed and allow for much larger model training with composer.
Hi @ananyahjha93 , thanks for the interest! The Composer is not actively developing a new feature using deepspeed. Instead, it is basically in maintenance mode with deepspeed. Hence, we can't guarantee a bug free support of deepspeed stage-3 right now. However, if you are interested, you can try out Composer + FSDP and you can look at this doc for more information. We are also working on a full fledged FSDP doc with Composer. So stay tuned for the update. Thanks!
@karan6181 I wasn't able to make bloom work with fsdp properly. I think bloom was trained using megatron LM with deepspeed and hasn't been tested with fsdp. And it works out of the box with deepspeed stage-2 in composer. So for my current research I would prefer to stick to deepspeed and not delve into trying to make bloom work with fsdp!
@ananyahjha93 can you try upgrading deepspeed manually? Basically, install composer and then install deepspeed at the higher version (if an upgrade is necessary) to try stage 3? I believe the version Composer installs is out of date.
Currently, we don't have a great set of benchmarks with deepspeed, so we've been cautious to upgrade the package version Composer requires because we're not sure what it'll break without coverage tests. However, I think it should still work if you manually do it
@mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires stage3_gather_16bit_weights_on_model_save
to be set to true since state_dict
is sharded so I am not sure if we need to handle this checkpoint saving differently with composer.
@mvpatel2000 I have already been using deepspeed==0.7.2. The training seems to work with composer, my only concern is saving of checkpoints because deepspeed stage-3 requires
stage3_gather_16bit_weights_on_model_save
to be set to true sincestate_dict
is sharded so I am not sure if we need to handle this checkpoint saving differently with composer.
Hm.... One option to do this is to build your own CheckpointSaver
callback. You can copy paste the one we have in Composer and modify it to change how the checkpoint works. Then, instead of passing in any checkpointing args into Trainer, you can pass your modified callback for deepspeed stage-3
@ananyahjha93 note that we just updated Composer to support the most recent deepspeed release if this is still an issue
Closing for now since we've updated deepspeed