Jen Wei

Results 2 issues of Jen Wei

This PR adds the MuonW optimizer to OLMo, implementing the Muon optimization algorithm with AdamW fallback for non-matrix parameters. **Key features**: - Implements Muon's Newton-Schulz orthogonalization for matrix parameters (2D+)...

I've been studying SmolLM3's dual-mode training approach and have a technical question about the choice of Anchored Preference Optimization (APO) over Group Relative Policy Optimization (GRPO) for handling reasoning capabilities....