torchrec
torchrec copied to clipboard
Turn the dummy_tensor's grad off
Summary: bdhirsh introduced a change in D51418076 where intermediate leafs with grad will cause a graph break.
This leads to graph breaks in training our APS model. Example: P1156881935
The graph breaks happen on the dummy_tensor which requires grad. However this is not necessary. In the original diff D38469224, Ying has done an experiments showing that grad is not populated at all.
Therefore, we turn the grad off in this diff to avoid graph breaks on APS model training.
Differential Revision: D53449759
This pull request was exported from Phabricator. Differential Revision: D53449759