stylable
stylable copied to clipboard
Is there any benchmark comparison with Megatron-LM ?
Not sure BYTEPS how to handle distribute training cross thousands of GPUs, if NCCL only run within single Node. Especially without SHARP.