Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] why the time of one iter in nsys longer than that in the ouput log?

Open hanwen-sun opened this issue 11 months ago • 1 comments

I want to compare the speed of training llama2-7b between libai(https://github.com/Oneflow-Inc/libai) and Megatron-LM in NVIDIA A800-SXM4-80G. But I find the time of one iter in nsys is longer than the output in log when using Megatron-LM;

  • the log time is:
 iteration      200/    1000 | consumed samples:          200 | elapsed time per iteration (ms): 183.7 | learning rate: 9.375E-06 | global batch size:     1 | lm loss: 7.889984E+00 | loss scale: 1.0 | grad norm: 4.921 | number of skipped iterations:   0 | number of nan iterations:   0 |
  • the nsys time is: 1710414973190 and I can't find many gap in the cuda stream.

Can anyone explain this to me?

hanwen-sun avatar Mar 14 '24 11:03 hanwen-sun