composer icon indicating copy to clipboard operation
composer copied to clipboard

Supercharge Your Model Training

Results 263 composer issues
Sort by recently updated
recently updated
newest added
trafficstars

## 🚀 Feature Request Add an integration to use https://github.com/pytorch/torchsnapshot ## Motivation TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind....

enhancement

# What does this PR do? sets the default for sharded and local state dicts to offload_to_cpu=True. This helps avoid OOMs for large models when saving sharded checkpoints ## Testing...

** Environment ** ``` Collecting system information... --------------------------------- System Environment Report Created: 2023-06-29 13:15:05 PDT --------------------------------- PyTorch information ------------------- PyTorch version: 2.0.1+cu117 Is debug build: False CUDA used to build...

bug

# What does this PR do? # What issue(s) does this change relate to? # Before submitting - [ ] Have you read the [contributor guidelines](https://github.com/mosaicml/composer/blob/dev/CONTRIBUTING.md)? - [ ] Is...

# What does this PR do? Only deepspeed has errors with pydantic 2. Moving the pin down to there as we don't actually use in composer normally

# What does this PR do? Adds a distributed sync to the `RemoteUploaderDownloader.wait_for_workers` call so that the run does not NCCL timeout while uploading a large checkpoint at the end...

# What does this PR do? Batch up log metrics calls in speed_monitor.py. # What issue(s) does this change relate to? Speed up logging. # Before submitting - [ ]...

Updates the requirements on [torchmetrics](https://github.com/Lightning-AI/torchmetrics) to permit the latest version. Release notes Sourced from torchmetrics's releases. Visualize metrics We are happy to announce that the first major release of Torchmetrics,...

dependencies

## 🚀 Feature Request I found that MLFlowLogger slows down the throughput twice than wandbLogger. I saw that there are lots of "import mlflow" in https://github.com/mosaicml/composer/blob/dev/composer/loggers/mlflow_logger.py, is that root cause?...

enhancement

## 🚀 Feature Request Now this package can load data from local path / http / s3, is there a plan to support huggingface datasets? ## Motivation Some datasets supply...

enhancement