distributed-training-guide icon indicating copy to clipboard operation
distributed-training-guide copied to clipboard

suggestions for more examples

Open daire-byrne opened this issue 2 weeks ago • 1 comments

I finally made it to the end of the examples (well, the deepspeed one just hangs for me), and now I'm hungry for more!

I think it would be nice to include one on HSDP (FSDP + DDP) to sit between the DDP, FSDP and the 2D (TP + FSDP) sections. And maybe one on context/sequence parallelism too?

I think it is common to have access to some sort of PCIe bases multi GPU server, and being able to use GPUs across sockets is a useful trick for HSDP.

Pipeline parallelism is just too hard to get my head around so you can ignore that one!

Many thanks again for putting this all together.

daire-byrne avatar Nov 19 '25 23:11 daire-byrne