Ananth Subramaniam
Ananth Subramaniam
Summary: Add a callback to torchTNT which saves checkpoints using torchsnapshot. this relies on the app state mixin defined here: this means users can declare what module/optimizer/etc states they'd like...
We often have this pattern: ``` def get_filesystem(path: str, **kwargs: Any) -> fsspec.AbstractFileSystem: """Returns the appropriate filesystem to use when handling the given path.""" fs, _ = url_to_fs(path, **kwargs) return...
## 🚀 Feature Request ## Motivation TorchEval is a newly released library from PyTorch for common evaluation metrics and tools: https://pytorch.org/torcheval/main/ The MosaicML framework currently has a deep integration with...
## 🚀 Feature Request Add an integration to use https://github.com/pytorch/torchsnapshot ## Motivation TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind....
## 🚀 Feature TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind. It includes many optimizations to control for memory usage...
Summary: Sometimes users need to recreate the dataloaders at the start of the epoch during the overall training loop. To support this, we allow users to register a creation function...
Summary: Fixes https://github.com/pytorch/torcheval/issues/150 Differential Revision: D47241862