sanketpurandare
sanketpurandare
Allred
Added BertLarge model that has 16 attention heads, 24 hidden layers and 1024 as hidden size. Added ddp_trainer as an example that allows to benchmark ddp, all_reduce annd non_ddp version...
Introducing dispatch based memory tracker for tracking FSDP2 memory and module-wise memory breakdown. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol...
A tool for measuring module wise memory consumption. It also provides a breakdown of memory consumption in terms of parameters, gradients, activations and optimizer state. Works in fake tensor mode...
This PR adds a basic Runtime Estimator for single-device models. It estimates the GPU runtime in milliseconds using various estimation methods under the ``FakeTensorMode``. It provides a ``TorchDispatchMode`` based context...