torchkeras 是否能支持多卡训练

是否能支持多卡训练

Jul 15 '23 08:07 lacusrinz

支持，torchkeras基于accelerate开发，可参考 https://github.com/huggingface/accelerate 相关使用方法。建议使用deepspeed.

Jul 15 '23 14:07 lyhue1991

支持，torchkeras基于accelerate开发，可参考 https://github.com/huggingface/accelerate 相关使用方法。建议使用deepspeed. 您好打扰了

这里去详细操作了一下使用fit_dpp方法会报错，是需要去修改fit方法和fit_dpp方法以实现多卡吗



ckpt_path = 'baichuan13b_ner'

optimizer = bnb.optim.adamw.AdamW(peft_model.parameters(),
                                  lr=6e-05,is_paged=True)  #'paged_adamw'
# 初始化KerasModel
keras_model = KerasModel(peft_model, loss_fn =None,  optimizer=optimizer)

# 加载微调后的权重
keras_model.load_ckpt(ckpt_path)

# 使用多GPU训练
keras_model.fit_ddp(num_processes=2,
                    train_data=dl_train,
                    val_data=dl_val,
                    epochs=100,
                    patience=10,
                    monitor='val_loss',
                    mode='min',
                    ckpt_path=ckpt_path)


```> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> Cell In[30], line 12
>       9 keras_model.load_ckpt(ckpt_path)
>      11 # 使用多GPU训练
> ---> 12 keras_model.fit_ddp(num_processes=2,
>      13                     train_data=dl_train,
>      14                     val_data=dl_val,
>      15                     epochs=100,
>      16                     patience=10,
>      17                     monitor='val_loss',
>      18                     mode='min',
>      19                     ckpt_path=ckpt_path)
> 
> File ~/anaconda3/envs/baichuan13b/lib/python3.9/site-packages/torchkeras/kerasmodel.py:282, in KerasModel.fit_ddp(self, num_processes, train_data, val_data, epochs, ckpt_path, patience, monitor, mode, callbacks, plot, wandb, quiet, mixed_precision, cpu, gradient_accumulation_steps)
>     279 from accelerate import notebook_launcher
>     280 args = (train_data,val_data,epochs,ckpt_path,patience,monitor,mode,
>     281     callbacks,plot,wandb,quiet,mixed_precision,cpu,gradient_accumulation_steps)
> --> 282 notebook_launcher(self.fit, args, num_processes=num_processes)
> 
> File ~/anaconda3/envs/baichuan13b/lib/python3.9/site-packages/accelerate/launchers.py:116, in notebook_launcher(function, args, num_processes, mixed_precision, use_port)
>     113 from torch.multiprocessing.spawn import ProcessRaisedException
>     115 if len(AcceleratorState._shared_state) > 0:
> --> 116     raise ValueError(
>     117         "To launch a multi-GPU training from your notebook, the `Accelerator` should only be initialized "
>     118         "inside your training function. Restart your notebook and make sure no cells initializes an "
>     119         "`Accelerator`."
>     120     )
>     122 if torch.cuda.is_initialized():
>     123     raise ValueError(
>     124         "To launch a multi-GPU training from your notebook, you need to avoid running any instruction "
>     125         "using `torch.cuda` in any cell. Restart your notebook and make sure no cells use any CUDA "
>     126         "function."
>     127     )
> 
> ValueError: To launch a multi-GPU training from your notebook, the `Accelerator` should only be initialized inside your training function. Restart your notebook and make sure no cells initializes an `Accelerator`.

Aug 02 '23 16:08 looperEit

我也是同样的错误诶，多卡用fit_dpp()会报这个错误，因为cuda初始化。老哥解决了么

Aug 03 '23 08:08 S-Moer

我也是同样的错误诶，多卡用fit_dpp()会报这个错误，因为cuda初始化。老哥解决了么

已经找到问题原因：

accelerator在老师代码里调用已经是在训练阶段了，得先确保在那之前的代码没有挂到gpu上 notebook_launcher函数是检查torch.cuda.is_initialized()如果这个变量为true，就会报你说的那个错误。但是只要导入了bitsandbytes相关的包，就会设置torch.cuda.is_initialized()这个为true。所以在笔记本里可能没办法多卡运行。我也尝试了放在py文件里，但是由于模型是量化版本，又会报出错误8-bit的moel不能多卡运行。所以考虑采用这个https://github.com/hiyouga/LLaMA-Efficient-Tuning来做多卡训练

感谢群里的不负长风桑

Aug 03 '23 08:08 looperEit

感谢回复，咱俩基本一样的心路历程。

Aug 05 '23 05:08 S-Moer

参考一下这个范例：https://github.com/xxm1668/chatglm2_lora/blob/main/train2.py

Sep 14 '23 11:09 lyhue1991

参考一下这个范例：https://github.com/xxm1668/chatglm2_lora/blob/main/train2.py

这个train2.py也只是跑通了deepspeed，仍然是单卡在跑吧？ @lyhue1991

Oct 19 '23 11:10 onair1314

我也是同样的错误诶，多卡用fit_dpp()会报这个错误，因为cuda初始化。老哥解决了么

已经找到问题原因：

accelerator在老师代码里调用已经是在训练阶段了，得先确保在那之前的代码没有挂到gpu上 notebook_launcher函数是检查torch.cuda.is_initialized()如果这个变量为true，就会报你说的那个错误。但是只要导入了bitsandbytes相关的包，就会设置torch.cuda.is_initialized()这个为true。所以在笔记本里可能没办法多卡运行。我也尝试了放在py文件里，但是由于模型是量化版本，又会报出错误8-bit的moel不能多卡运行。所以考虑采用这个https://github.com/hiyouga/LLaMA-Efficient-Tuning来做多卡训练

感谢群里的不负长风桑

大佬我是报类似的错误，problematic_imports = are_libraries_initialized("bitsandbytes")检测出bitsandbytes被提前初始化了，但是你提供的这个链接没有了，没得参考了..

Apr 11 '24 01:04 liyunhan

torchkeras torchkeras copied to clipboard

是否能支持多卡训练

torchkeras
torchkeras copied to clipboard