distributed-training-guide suggestions for more examples

suggestions for more examples

Open daire-byrne opened this issue 2 weeks ago • 1 comments

I finally made it to the end of the examples (well, the deepspeed one just hangs for me), and now I'm hungry for more!

I think it would be nice to include one on HSDP (FSDP + DDP) to sit between the DDP, FSDP and the 2D (TP + FSDP) sections. And maybe one on context/sequence parallelism too?

I think it is common to have access to some sort of PCIe bases multi GPU server, and being able to use GPUs across sockets is a useful trick for HSDP.

Pipeline parallelism is just too hard to get my head around so you can ignore that one!

Many thanks again for putting this all together.

Nov 19 '25 23:11 daire-byrne

distributed-training-guide distributed-training-guide copied to clipboard

suggestions for more examples

distributed-training-guide
distributed-training-guide copied to clipboard