Megatron-LM [BUG] reduce_aux_losses_tracker_accross_ranks hangs if first pipeline stage has no moe layers

Describe the bug https://github.com/NVIDIA/Megatron-LM/blob/8a5521ac4226fbefeeb2a102ebecac32a01d4852/megatron/core/transformer/moe/moe_utils.py#L586-L588 reduce_aux_losses_tracker_across_ranks do all_reduce accross _PIPELINE_MODEL_PARALLEL_GROUP. If some pipeline stage has no moe layers, all_reduce will hangs.

To Reproduce modeling with:

--tensor-model-parallel-size 1
--pipeline-model-parallel-size 8
--expert-model-parallel-size 1
--expert-tensor-parallel-size 1
--num-layers 16
--moe-layer-freq "([0]*3+[1]*13)"

will hangs because fisrt pp stage has no tracker info while other stage has tracker info like {'load_balancing_loss': {'values': tensor([...])}}

Expected behavior First pp stage should have zero padding values.

Stack trace/logs N/A

Environment (please complete the following information):