Is there any benchmark comparison with Megatron-LM ?

Open sequoiar opened this issue 2 years ago • 0 comments

Not sure BYTEPS how to handle distribute training cross thousands of GPUs, if NCCL only run within single Node. Especially without SHARP.

Jun 16 '23 03:06 sequoiar