Megatron-LM
Megatron-LM copied to clipboard
[QUESTION]
Hi, I am training my Llama2-7b model with Megatron-LM, using four H20s, 32 GPUs in total. The parallel strategy is set to: TP=8/PP=2/DP=2. Now, I want to know the data capacity of different parallel groups communicating, is there some parameter setting to get these values, or if there is no such parameter, how can I get it from the code? Thank you.