Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Performance Impact of Using item() in `total_num_tokens += num_tokens.item()` in megatron/core/pipeline_parallel/schedules.py

Open wan-nan opened this issue 10 months ago • 2 comments

Hi Megatron-LM team!

While going through the code in megatron/core/pipeline_parallel/schedules.py, I noticed that between each forward and backward pass, the line total_num_tokens += num_tokens.item() uses the item() method.

https://github.com/NVIDIA/Megatron-LM/blob/8ca9e57f9d0bb93fc61850ebdccb6b6e6fa36b64/megatron/core/pipeline_parallel/schedules.py#L451-L467

From my understanding, the item() method transfers data from the GPU device to the host, which could cause the CPU to block and wait for the GPU to finish its computation. This might have a negative impact on performance, as illustrated below.

Image

To validate this, I removed the item() method and observed that the time cost associated with this operation was completely eliminated.

Image

Could you clarify why item() is used here?

Thanks for your time and insights!

wan-nan avatar Feb 13 '25 06:02 wan-nan

Hi, wan-nan, Thanks for looking into it. This is being addressed in an internal MR.

shifangx avatar Mar 25 '25 06:03 shifangx

Marking as stale. No activity in 60 days.

github-actions[bot] avatar May 24 '25 18:05 github-actions[bot]

This issue is fixed with the following commit https://github.com/NVIDIA/Megatron-LM/commit/87d9d2506acefaf3bd617b27ebbd24c7ddfcea5c

shifangx avatar May 30 '25 04:05 shifangx

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 29 '25 02:07 github-actions[bot]