使用deepspeed 多机分布式训练,加载opt-1.3b 模型的时候,报a leaf Variable that requires grad is being used in an in-place operation错误
colorful: Traceback (most recent call last):
colorful: File "DeepSpeed-Chat/training/main_sup.py", line 339, in
colorful: main()
colorful: File "DeepSpeed-Chat/training/main_sup.py", line 286, in main
colorful: model, optimizer, _, lr_scheduler = deepspeed.initialize(
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/init.py", line 156, in initialize
colorful: engine = DeepSpeedEngine(args=args,
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 286, in init
colorful: self._configure_distributed_model(model)
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1087, in _configure_distributed_model
colorful: self._broadcast_model()
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1017, in _broadcast_model
colorful: dist.broadcast(p, groups._get_broadcast_src_rank(), group=self.data_parallel_group)
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 120, in log_wrapper
colorful: return func(*args, **kwargs)
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 217, in broadcast
colorful: return cdb.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/deepspeed/comm/torch.py", line 81, in broadcast
colorful: return torch.distributed.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
colorful: File "/home/vocust001/miniconda3/envs/ldm/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1201, in broadcast
colorful: work.wait()
colorful: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.