megablocks different load_balancing_loss with different pipeline_parallel

different load_balancing_loss with different pipeline_parallel_size

Open bozheng-hit opened this issue 1 year ago • 8 comments

I load the same model trained with megatron + megablocks, and I found the load_balancing_loss is slightly different. When I increase the pipeline_parallel_size, the load_balancing_loss is also increasing. Is it just the problem of precision or there is a potential bug？

For example, when I train a 500M gpt model with 64 experts, I list the lbl and pp_size in the table below.

pp_size	lbl
1	1.005E-01
2	1.007E-01
4	1.013E-01

Jan 05 '24 06:01 bozheng-hit

megablocks megablocks copied to clipboard

different load_balancing_loss with different pipeline_parallel_size

megablocks
megablocks copied to clipboard