Megatron-LM
Megatron-LM copied to clipboard
fix: count_zeros protection in chained optimizer
Make ChainedOptimizer honor log_num_zeros_in_grad to keep the behavior consistent in case we silently end up using it, e.g. when using EP>1.