distributed-training-guide Add guidance on DDP bucket_cap

Add guidance on DDP bucket_cap_mb for larger models & torch.compile

Open chelsea0x3b opened this issue 1 month ago • 2 comments

The default is 25mb, which for even llama 8b is too small. Might be best to suggest use of max param size.

Unclear how this impacts things - also guidance on when to use torch.compile

Oct 22 '25 01:10 chelsea0x3b