sanketpurandare

Results 4 issues of sanketpurandare

Added BertLarge model that has 16 attention heads, 24 hidden layers and 1024 as hidden size. Added ddp_trainer as an example that allows to benchmark ddp, all_reduce annd non_ddp version...

cla signed

Introducing dispatch based memory tracker for tracking FSDP2 memory and module-wise memory breakdown. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol...

oncall: distributed
topic: not user facing

A tool for measuring module wise memory consumption. It also provides a breakdown of memory consumption in terms of parameters, gradients, activations and optimizer state. Works in fake tensor mode...

This PR adds a basic Runtime Estimator for single-device models. It estimates the GPU runtime in milliseconds using various estimation methods under the ``FakeTensorMode``. It provides a ``TorchDispatchMode`` based context...

oncall: distributed
triaged
open source
topic: not user facing
release notes: distributed (tools)