Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Calculations regarding calculate_per_token_loss parameter

Open clarence-lee-sheng opened this issue 7 months ago • 0 comments

In line 231-233 in megatron/core/pipeline_parallel/schedules.py (megatron/core/pipeline_parallel/schedules.py), I have two questions:

  1. Why are we dividing by num_tokens when the conditional is "if not config.calculate_per_token_loss"
  2. What is the purpose of dividing by num_microbatches if it is a constant, and if it is important, why do we not also divide by num_microbatches outside of the condition for the config.calculate_per_token_loss true case.

clarence-lee-sheng avatar Jul 19 '24 09:07 clarence-lee-sheng