torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

improve reshard_after_forward logic

Open tianyu-l opened this issue 8 months ago • 1 comments

according to discussions in https://github.com/pytorch/torchtitan/issues/1091

The CI failure is because FSDPMemTracker is not compatible of fully_shard on a list of modules. @sanketpurandare will help address this soon. Let's land it after the feature is available.

tianyu-l avatar Apr 11 '25 23:04 tianyu-l

@tianyu-l I think it's also acceptable for now to allow the norm to be assigned to the root module. In other words, just wrap tok_embeddings separately and output separately.

awgu avatar Apr 12 '25 01:04 awgu

Rebase to merge the PR

wwwjn avatar Jul 29 '25 02:07 wwwjn