Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Why should CUDA_DEVICE_MAX_CONNECTIONS=1 should be set when using seq_parallel or async comm?

Open Infi-zc opened this issue 2 years ago • 5 comments

I have a question, why is it necessary to set CUDA_DEVICE_MAX_CONNECTIONS=1 after enabling seq_parallel? This note is written in the bwd_compute function, and it says it is to launch communication first, so as to achieve overlap with calculation. But I don't understand the inevitable connection between these two. Can anyone help explain? Thank you.

Infi-zc avatar Oct 09 '23 12:10 Infi-zc

It enforces the order of kernel execution on GPU as the kernel queuing order from host. Its for GEMM and TP communication overlap it allows for scheduling the communication kernel in GPU ahead of the GEMM to have the communication kernel have GPU resources allocated before GEMM takes all of them. From my understanding its like essentially having a single stream

wdykas avatar Oct 12 '23 18:10 wdykas

@wdykas Thanks for your reply! I have some further questions about that env var.

  1. Does this environment variable have an upper limit?
  2. What is the default value?

Infi-zc avatar Oct 16 '23 12:10 Infi-zc

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Dec 15 '23 18:12 github-actions[bot]

@wdykas Thanks for your reply! I have some further questions about that env var.

1. Does this environment variable have an upper limit?

2. What is the default value?

See here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables

JigaoLuo avatar Feb 23 '24 13:02 JigaoLuo

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Apr 24 '24 18:04 github-actions[bot]