torchrec
torchrec copied to clipboard
Extend memory freeing to other PipelinedForwards
Summary: Biggest win in semi-sync pipeline.
Post diff
TrainPipelineBase | Runtime (P90): 10.098 s | Memory (P90): 8.418 GB TrainPipelineSparseDist | Runtime (P90): 10.050 s | Memory (P90): 8.655 GB TrainPipelineSemiSync | Runtime (P90): 9.541 s | Memory (P90): 10.332 GB PrefetchTrainPipelineSparseDist | Runtime (P90): 10.063 s | Memory (P90): 8.918 GB
Pre diff TrainPipelineBase | Runtime (P90): 10.125 s | Memory (P90): 8.418 GB TrainPipelineSparseDist | Runtime (P90): 10.033 s | Memory (P90): 8.654 GB TrainPipelineSemiSync | Runtime (P90): 9.529 s | Memory (P90): 11.932 GB PrefetchTrainPipelineSparseDist | Runtime (P90): 10.109 s | Memory (P90): 8.910 GB
Differential Revision: D57169568
This pull request was exported from Phabricator. Differential Revision: D57169568
This pull request was exported from Phabricator. Differential Revision: D57169568