Ananth Subramaniam comments

Results 11 comments of


                                            Ananth Subramaniam

Align ddp and ddp-spawn strategies in setting up the environment

I believe this will be a prerequisite for further consolidation efforts like https://github.com/PyTorchLightning/pytorch-lightning/pull/11021 and https://github.com/PyTorchLightning/pytorch-lightning/pull/11020 by moving up the `setup_environment` of DDP and TPU spawn

Add usage logging APIs for the Runner's public entry points

Merging through https://github.com/pytorch/tnt/pull/179

Checkpointing primitives for Lite

You might find this library useful for such primitives, especially to support distributed checkpointing: https://github.com/pytorch/torchsnapshot @yifuwang

Support uneven DDP inputs with pytorch model.join

I discussed this more with @rohan-varma - DDP join docs: https://pytorch.org/docs/stable/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.join > This module currently does not support custom distributed collective operations in the forward pass, such as SyncBatchNorm or...

checkpoint migration

cc @kandluis @aazolini @yifuwang who were also curious if there's a serialization format we stick to for the state dict, or if the contents of the state dict are considered...

Multiple metrics sharing the same state

Hi @ZhiyuanChen , thanks for creating this issue! Could you point to memory increases if using multiple metrics like AUROC and AUPRC? If so, as pointed out, this implementation be...

Loading tensors in lists/dict that have not yet been instantiated

@yifuwang was this fixed by https://github.com/pytorch/torchsnapshot/pull/104 ?

Label tracking meta-issue (edit me to get automatically CC'ed on issues!)

@carmocca I cannot edit this issue. Would you please remove myself, @ninginthecloud, @edward-io, and @jjenniferdai from being tagged? thanks!

Support for TorchSnapshot for efficient checkpoint saving and loading

> A question I have about the usage, in DDP user should call Snapshot.take by all ranks ? Yes, Snapshot.take should always be called on all ranks in a distributed...

🚀 Add FLOPs count to model summary

Prior issue: https://github.com/PyTorchLightning/pytorch-lightning/issues/3337