Megatron-LM
Megatron-LM copied to clipboard
[QUESTION] Can fp8 and pipeline parallelism be used together?
Your question Ask a clear and concise question about Megatron-LM.
Hello, can fp8 and pipeline parallelism be used together? When I try to use both the training gets hung up, and then timed out by NCCL. Training code starts up, but no logging update occurs.
I can use fp8 and model parallelism ok, though.
Curious if anyone else noticed this?
Thanks!
@exnx sorry I don't understand. FP8 has independent groups to keep reduce is accurate. Could it be a problem of your fp8 group and pipeline group setting ?