Cupy Buffers Initialized Successfully.
Pop out errors
Finished the initialization step at rank 0
Pop out errors
Finished the initialization step at rank 1
Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py310_cu116/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.07267284393310547 seconds
Loading extension module utils...
Time to load utils op: 0.1021263599395752 seconds
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/finetune.py:180 in │
│ │
│ │
│ 177 │
│ 178 │
│ 179 if name == "main": │
│ ❱ 180 │ main() │
│ 181 │
│ │
│ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/finetune.py:173 in main │
│ │
│ 170 │ │ # callbacks=[TensorBoardCallback(writer)], │
│ 171 │ │ data_collator=data_collator, │
│ 172 │ ) # 初始化训练器,用于训练模型 │
│ ❱ 173 │ trainer.train() # 训练模型 │
│ 174 │ # writer.close() # 关闭TensorBoard写入器 │
│ 175 │ # save model │
│ 176 │ model.save_pretrained(training_args.output_dir) # 保存微调后的模型到指定目录 │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1929 in _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2699 in training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/finetune.py:86 in │
│ compute_loss │
│ │
│ 83 │
│ 84 class ModifiedTrainer(Trainer): │
│ 85 │ def compute_loss(self, model, inputs, return_outputs=False): │
│ ❱ 86 │ │ return model( │
│ 87 │ │ │ input_ids=inputs["input_ids"], │
│ 88 │ │ │ labels=inputs["labels"], │
│ 89 │ │ ).loss │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /opt/conda/lib/python3.10/site-packages/deepspeed/utils/nvtx.py:11 in wrapped_fn │
│ │
│ 8 │ function call.""" │
│ 9 │ def wrapped_fn(*args, **kwargs): │
│ 10 │ │ get_accelerator().range_push(func.qualname) │
│ ❱ 11 │ │ ret_val = func(*args, **kwargs) │
│ 12 │ │ get_accelerator().range_pop() │
│ 13 │ │ return ret_val │
│ 14 │
│ │
│ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1846 in forward │
│ │
│ 1843 │ │ if self.fp16_auto_cast(): │
│ 1844 │ │ │ inputs = self._cast_inputs_half(inputs) │
│ 1845 │ │ │
│ ❱ 1846 │ │ loss = self.module(*inputs, **kwargs) │
│ 1847 │ │ │
│ 1848 │ │ if self.zero_optimization_partition_weights(): │
│ 1849 │ │ │ # Disable automated discovery of external parameters │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /workspace/LLM-Adapters/peft/src/peft/peft_model.py:532 in forward │
│ │
│ 529 │ │ **kwargs, │
│ 530 │ ): │
│ 531 │ │ if not isinstance(self.peft_config, PromptLearningConfig): │
│ ❱ 532 │ │ │ return self.base_model( │
│ 533 │ │ │ │ input_ids=input_ids, │
│ 534 │ │ │ │ attention_mask=attention_mask, │
│ 535 │ │ │ │ inputs_embeds=inputs_embeds, │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py:1160 in │
│ forward │
│ │
│ 1157 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │
│ 1158 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │
│ 1159 │ │ │
│ ❱ 1160 │ │ transformer_outputs = self.transformer( │
│ 1161 │ │ │ input_ids=input_ids, │
│ 1162 │ │ │ position_ids=position_ids, │
│ 1163 │ │ │ attention_mask=attention_mask, │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py:907 in │
│ forward │
│ │
│ 904 │ │ │ raise ValueError("You have to specify either input_ids or inputs_embeds") │
│ 905 │ │ │
│ 906 │ │ if inputs_embeds is None: │
│ ❱ 907 │ │ │ inputs_embeds = self.word_embeddings(input_ids) │
│ 908 │ │ │
│ 909 │ │ if past_key_values is None: │
│ 910 │ │ │ if self.pre_seq_len is not None: │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1212 in _call_impl │
│ │
│ 1209 │ │ │ bw_hook = hooks.BackwardHook(self, full_backward_hooks) │
│ 1210 │ │ │ input = bw_hook.setup_input_hook(input) │
│ 1211 │ │ │
│ ❱ 1212 │ │ result = forward_call(input, **kwargs) │
│ 1213 │ │ if _global_forward_hooks or self._forward_hooks: │
│ 1214 │ │ │ for hook in (_global_forward_hooks.values(), *self.forward_hooks.values()) │
│ 1215 │ │ │ │ hook_result = hook(self, input, result) │
│ │
│ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module.hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py:160 in forward │
│ │
│ 157 │ │ │ │ self.weight[self.padding_idx].fill(0) │
│ 158 │ │
│ 159 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 160 │ │ return F.embedding( │
│ 161 │ │ │ input, self.weight, self.padding_idx, self.max_norm, │
│ 162 │ │ │ self.norm_type, self.scale_grad_by_freq, self.sparse) │
│ 163 │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py:2210 in embedding │
│ │
│ 2207 │ │ # torch.embedding_renorm │
│ 2208 │ │ # remove once script supports set_grad_enabled │
│ 2209 │ │ no_grad_embedding_renorm(weight, input, max_norm, norm_type) │
│ ❱ 2210 │ return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) │
│ 2211 │
│ 2212 │
│ 2213 def embedding_bag( │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)
[2023-04-08 16:00:51,018] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 9721
[2023-04-08 16:00:51,398] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 9722
[2023-04-08 16:00:51,399] [ERROR] [launch.py:324:sigkill_handler] ['/opt/conda/bin/python', '-u', 'finetune.py', '--local_rank=1'] exits with return code = 1
并没有,我尝试了别人的多卡训练的代码也是同样的错误,我开始是怀疑可能是环境问题,于是我重新配置了docker环境是python3.8,cuda11.0+nvidia的cuda11.6的兼容包,然后cupy是11.0的,其他的环境都重新配置了一遍,但是仍然是有这个错误,但是只用accelerate是能跑起来的,比单纯不用要快很多,可是显存主要是一张用的比较多。按道理Trainer内置了deepspeed跟accelerate,但是正常配置在TrainerAugments中并不起作用也报的这个错误。
@Mr-lonely0
我解决了这个问题,但是解决的方式非常玄学。。。
我的model是用autoModel下载到本地的,于是当我在一个新的容器上没用使用automodel下载过的模型去加载我共享盘下好的chatglm进行ddp的多卡训练时候,就会报这个错误,但是当我把模型modeling_chatglm跟tokenizer_chatglm源代码拉出来,然后改成ChatGLM的tokenizer跟model的类型再跑一遍会报一个维度错误的error,但是当我进行了以上两步之后再改回去AutoModel跟AutoTokenizer,神奇的是居然不报错了,顺利的ddp跑起来了。。
这个流程我试了两遍,简直玄学。。。
@Mr-lonely0 我解决了这个问题,但是解决的方式非常玄学。。。
我的model是用autoModel下载到本地的,于是当我在一个新的容器上没用使用automodel下载过的模型去加载我共享盘下好的chatglm进行ddp的多卡训练时候,就会报这个错误,但是当我把模型modeling_chatglm跟tokenizer_chatglm源代码拉出来,然后改成ChatGLM的tokenizer跟model的类型再跑一遍会报一个维度错误的error,但是当我进行了以上两步之后再改回去AutoModel跟AutoTokenizer,神奇的是居然不报错了,顺利的ddp跑起来了。。
这个流程我试了两遍,简直玄学。。。
chatglm好像是不支持层并行的
不要用deepspeed,用huggingface 的accelerate,或者加载的时候,device_map="auto"
怎么解决?我是在本地离线训练的,也是报这个错误!!!求指导