torchrec
torchrec copied to clipboard
Global gradient clipping for 2D DTensor
Summary: For DTensor based gradient clipping, in the 2D or nD parallelism case we don't know the specific dimension for FSDP, to avoid potential accuracy issue, we always do global clipping.
Differential Revision: D62609765
This pull request was exported from Phabricator. Differential Revision: D62609765
This pull request was exported from Phabricator. Differential Revision: D62609765