tutel icon indicating copy to clipboard operation
tutel copied to clipboard

how to use tutel on Megatron Deepspeed

Open wangyuxin87 opened this issue 2 years ago • 4 comments

can tutel be used with Megatron Deepspeed?

wangyuxin87 avatar Jul 15 '23 02:07 wangyuxin87

Do you mean Megatron and Deepspeed respectively, or working together for them all?

ghostplant avatar Jul 17 '23 04:07 ghostplant

@ghostplant Can tutel work concurrently with Megatron or Deepspeed respectively?

xcwanAndy avatar Apr 25 '24 06:04 xcwanAndy

Yes, Tutel is just an MoE layer implementation which is pluggable for any distributed frameworks. The way for other framework to use Tutel MoE layer is by passing distributed processing group properly, e.g.:

my_processing_group = deepspeed.new_group(..)

moe_layer = tutel_moe.moe_layer(
    ..,
    group=my_processing_group
)

If other frameworks are not available, Tutel itself also provides a 1-line initialization to generate groups you need, which works for both distributed gpu (i.e. nccl) and distributed cpu (i.e. gloo):

from tutel import system
parallel_env = system.init_data_model_parallel(backend='nccl' if args.device == 'cuda' else 'gloo')
my_processing_group = [ parallel_env.data_group | parallel_env.model_group | parallel_env.global_group ]
...

ghostplant avatar Apr 25 '24 06:04 ghostplant

Thanks for your prompt response!

xcwanAndy avatar Apr 25 '24 06:04 xcwanAndy