guozhen1997

Results 1 issues of guozhen1997

**MOE training Loss inconsistent after resume from old checkpoint** Experimental conditions: - the latest main branch - use mcore - expert-model-parallel-size > 1 The black line runs continuously and saves...