Megatron-LM
Megatron-LM copied to clipboard
[QUESTION] Why do we need both " train_valid_test_datasets_provider.is_distributed = True" and batched data broadcasting ?
I noticed that when train_valid_test_datasets_provider.is_distributed = True data loader is created in all processes, ignoring their tensor parallel rank.
https://github.com/NVIDIA/Megatron-LM/blob/c02b335b6318ada8c6a38c95ce3c754da2a579f9/pretrain_vlm.py#L333
https://github.com/NVIDIA/Megatron-LM/blob/c02b335b6318ada8c6a38c95ce3c754da2a579f9/megatron/training/training.py#L1685
However, in get_batch(), the batched data is still broadcasted:
https://github.com/NVIDIA/Megatron-LM/blob/c02b335b6318ada8c6a38c95ce3c754da2a579f9/pretrain_vlm.py#L242
I got confused why do we need both of them? My understanding is that we need either distributed access or broadcasting from tp rank 0, not both of them.