Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Can fp8 and pipeline parallelism be used together?

Open exnx opened this issue 1 year ago • 1 comments

Your question Ask a clear and concise question about Megatron-LM.

Hello, can fp8 and pipeline parallelism be used together? When I try to use both the training gets hung up, and then timed out by NCCL. Training code starts up, but no logging update occurs.

I can use fp8 and model parallelism ok, though.

Curious if anyone else noticed this?

Thanks!

exnx avatar Jul 09 '24 05:07 exnx

@exnx sorry I don't understand. FP8 has independent groups to keep reduce is accurate. Could it be a problem of your fp8 group and pipeline group setting ?