shuangt

Results 1 issues of shuangt

好几个报错不知道怎么回事: 1、准备数据的时候在data_utils/__init__.py中hang住:torch.distributed.barrier(),注释掉这句可以运行 2、load模型继续预训练的时候即时设置“--no-load-optim”,仍然报错:file not found zero_pp_rank_0_mp_rank_00_optim_states.pt 3、不加载预训练模型的前提下,预训练hang在pretrain_glm.py的iteration, skipped = train(model, optimizer,...)函数,一夜没动静,命令行最终停止在这: `172.16.10.11: [2023-04-13 21:10:08,253] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information 172.16.10.11: [2023-04-13 21:10:08,254] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False...