composer
composer copied to clipboard
Set offload_to_cpu True for sharded and local
trafficstars
What does this PR do?
sets the default for sharded and local state dicts to offload_to_cpu=True. This helps avoid OOMs for large models when saving sharded checkpoints
Testing
Ran manual test of saving 30B checkpoints
What issue(s) does this change relate to?
https://github.com/mosaicml/llm-foundry/issues/367