guozhen1997
Results
1
issues of
guozhen1997
**MOE training Loss inconsistent after resume from old checkpoint** Experimental conditions: - the latest main branch - use mcore - expert-model-parallel-size > 1 The black line runs continuously and saves...