distributed-training-guide
distributed-training-guide copied to clipboard
suggestions for more examples
I finally made it to the end of the examples (well, the deepspeed one just hangs for me), and now I'm hungry for more!
I think it would be nice to include one on HSDP (FSDP + DDP) to sit between the DDP, FSDP and the 2D (TP + FSDP) sections. And maybe one on context/sequence parallelism too?
I think it is common to have access to some sort of PCIe bases multi GPU server, and being able to use GPUs across sockets is a useful trick for HSDP.
Pipeline parallelism is just too hard to get my head around so you can ignore that one!
Many thanks again for putting this all together.