Ashwath Aithal
Ashwath Aithal
@meatybobby what is the status ? can we cross post the Automodel PR ?
@akoumpa is there a reason you dont want to use Muon from: https://github.com/NVIDIA-NeMo/Emerging-Optimizers
@sanjana-inflection can you please respond to the request
@yfw please opine
updating the status here from @yfw : This seems like a large model so we will most likely need to use mcore path for this. We recently just merged this...
@ZhiyuLi-Nvidia is this something you can review ?
@sharathts can you please take a look and opine
@yaoyu-33 @yfw can we get a review for this ?
@guyueh1 can we also add a large model like 70B ? @joyang-nv we also need FP8 policy in the Dtensor path. we should enable this after we move to Automodel...
@katec846 please update the latest status