torchrec
torchrec copied to clipboard
Extend FSDP1 global clipping support for optimizers other than Shampoo
Summary:
This diff is a followup of D73474285 and lets other dense optimizers take it the enable_global_grad_clip optim config. When enalbe_global_grad_clip=True and FSDP1 is used, it would calculate global gradient norm at a cost of extra communication.
Next steps:
- global clipping more generic and work for FSDP2.
Differential Revision: D73969566
Privacy Context Container: L1235913
This pull request was exported from Phabricator. Differential Revision: D73969566