torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

Global gradient clipping for 2D DTensor

Open yoyoyocmu opened this issue 1 year ago • 2 comments

Summary: For DTensor based gradient clipping, in the 2D or nD parallelism case we don't know the specific dimension for FSDP, to avoid potential accuracy issue, we always do global clipping.

Differential Revision: D62609765

yoyoyocmu avatar Sep 12 '24 22:09 yoyoyocmu

This pull request was exported from Phabricator. Differential Revision: D62609765

facebook-github-bot avatar Sep 12 '24 22:09 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D62609765

facebook-github-bot avatar Sep 12 '24 23:09 facebook-github-bot