Stick-To
Results
1
issues of
Stick-To
when I train with this code, the loss become very large after small number of steps ``` deepspeed.init_distributed("nccl") mpu.initialize_model_parallel(int(args.tp), int(args.pp)) mpu.model_parallel_cuda_manual_seed(1234) mpu.checkpoint = deepspeed.checkpointing.checkpoint mpu.get_cuda_rng_tracker = deepspeed.checkpointing.get_cuda_rng_tracker mpu.model_parallel_cuda_manual_seed = deepspeed.checkpointing.model_parallel_cuda_manual_seed...