Megatron-LM
Megatron-LM copied to clipboard

Published 20 hours ago •

Reame
Issues

[QUESTION] Calculations regarding calculate_per_token_loss parameter

Open clarence-lee-sheng opened this issue 7 months ago • 0 comments

In line 231-233 in megatron/core/pipeline_parallel/schedules.py (megatron/core/pipeline_parallel/schedules.py), I have two questions:

Why are we dividing by num_tokens when the conditional is "if not config.calculate_per_token_loss"
What is the purpose of dividing by num_microbatches if it is a constant, and if it is important, why do we not also divide by num_microbatches outside of the condition for the config.calculate_per_token_loss true case.

Jul 19 '24 09:07 clarence-lee-sheng