Stick-To issues

Repositories
Issues
Comments

Results 1 issues of


Stick-To

how to combine deepspeed with megatron

when I train with this code, the loss become very large after small number of steps ``` deepspeed.init_distributed("nccl") mpu.initialize_model_parallel(int(args.tp), int(args.pp)) mpu.model_parallel_cuda_manual_seed(1234) mpu.checkpoint = deepspeed.checkpointing.checkpoint mpu.get_cuda_rng_tracker = deepspeed.checkpointing.get_cuda_rng_tracker mpu.model_parallel_cuda_manual_seed = deepspeed.checkpointing.model_parallel_cuda_manual_seed...