distributed-training-guide
distributed-training-guide copied to clipboard
Add guidance on DDP bucket_cap_mb for larger models & torch.compile
The default is 25mb, which for even llama 8b is too small. Might be best to suggest use of max param size.
Unclear how this impacts things - also guidance on when to use torch.compile