composer
composer copied to clipboard
Supercharge Your Model Training
## 🚀 Feature Request Add an integration to use https://github.com/pytorch/torchsnapshot ## Motivation TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind....
# What does this PR do? sets the default for sharded and local state dicts to offload_to_cpu=True. This helps avoid OOMs for large models when saving sharded checkpoints ## Testing...
** Environment ** ``` Collecting system information... --------------------------------- System Environment Report Created: 2023-06-29 13:15:05 PDT --------------------------------- PyTorch information ------------------- PyTorch version: 2.0.1+cu117 Is debug build: False CUDA used to build...
# What does this PR do? # What issue(s) does this change relate to? # Before submitting - [ ] Have you read the [contributor guidelines](https://github.com/mosaicml/composer/blob/dev/CONTRIBUTING.md)? - [ ] Is...
# What does this PR do? Only deepspeed has errors with pydantic 2. Moving the pin down to there as we don't actually use in composer normally
# What does this PR do? Adds a distributed sync to the `RemoteUploaderDownloader.wait_for_workers` call so that the run does not NCCL timeout while uploading a large checkpoint at the end...
# What does this PR do? Batch up log metrics calls in speed_monitor.py. # What issue(s) does this change relate to? Speed up logging. # Before submitting - [ ]...
Updates the requirements on [torchmetrics](https://github.com/Lightning-AI/torchmetrics) to permit the latest version. Release notes Sourced from torchmetrics's releases. Visualize metrics We are happy to announce that the first major release of Torchmetrics,...
## 🚀 Feature Request I found that MLFlowLogger slows down the throughput twice than wandbLogger. I saw that there are lots of "import mlflow" in https://github.com/mosaicml/composer/blob/dev/composer/loggers/mlflow_logger.py, is that root cause?...
## 🚀 Feature Request Now this package can load data from local path / http / s3, is there a plan to support huggingface datasets? ## Motivation Some datasets supply...