MS-AMP icon indicating copy to clipboard operation
MS-AMP copied to clipboard

Support for MS-AMP in FSDP

Open naveenkumarmarri opened this issue 2 years ago • 3 comments

What would you like to be added: Support for MS-AMP in FSDP.

Why is this needed: This will help train large model with optimizer state sharding.

naveenkumarmarri avatar Nov 05 '23 05:11 naveenkumarmarri

Thanks for your interest to our work!

We will support for MS-AMP in FSDP : )

wkcn avatar Nov 05 '23 09:11 wkcn

@wkcn is there a timeline you guys are targeting for FSDP integration?

naveenkumarmarri avatar Nov 10 '23 20:11 naveenkumarmarri

When applying FP8 to FSDP, there are 2 problems we need to solve: 1 FSDP requires that all parameters have same dtypes. If we only change some parameters to FP16/FP8, this rule will be broken. 2 Each fp8 tensor has a scaling factor. When updating parameter in optimizer, we need to synchronize scaling factor.

tocean avatar Dec 27 '23 09:12 tocean