FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

How to run train_lora.py

Open samarthsarin opened this issue 2 years ago • 13 comments
trafficstars

Hi Team,

Can someone please provide the command line instruction to execute train_lora.py file as the readme only contains the command for train_mem.py

Thank you

samarthsarin avatar Apr 25 '23 06:04 samarthsarin

Me too! Also ask how to train lora for Vicuna.

tjb-tech avatar Apr 25 '23 07:04 tjb-tech

Please check here(mention that the given configuration may not work, and it's just an example of the use case)

ZYHowell avatar Apr 25 '23 22:04 ZYHowell

Have you managed to get the lora running ?

alexanderfrey avatar Apr 27 '23 08:04 alexanderfrey

I have not run it successfully, because even I used the lora train, there is still a Out-of-memery error in my GPU (3090-24G) when I fine tune 7B base model. So, how much memory of GPU are required when using lora train? And I am also interested in why I could easily run lora train for alpaca-lora when I fine tune the same 7B base model. Is FastChat more heavy?

tjb-tech avatar Apr 27 '23 09:04 tjb-tech

For oom, please try to add these lines from Alpaca-Lora. I'll add a PR if that works.

ZYHowell avatar Apr 27 '23 16:04 ZYHowell

I have tried this, it does train with alpaca-lora, but when I try inference - it produce result - like it was not trained.

pauliustumas avatar Apr 27 '23 19:04 pauliustumas

The flash_attn is not supported.Use load_in_8bit,peft technology and bitsandbytes to accelerate.It requires about 13G of GPU memory.

https://github.com/git-cloner/llama-lora-fine-tuning#341-fine-tuning-command for the training script

train_lora.py needs to be modified, refer to: https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/fastchat/train/train_lora.py and
https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/deepspeed-config.json

little51 avatar May 31 '23 06:05 little51

@little51 How to support multi-GPU training?

The flash_attn is not supported.Use load_in_8bit,peft technology and bitsandbytes to accelerate.It requires about 13G of GPU memory.

https://github.com/git-cloner/llama-lora-fine-tuning#341-fine-tuning-command for the training script

train_lora.py needs to be modified, refer to: https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/fastchat/train/train_lora.py and https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/deepspeed-config.json

zl1994 avatar May 31 '23 12:05 zl1994

@zl1994 Multi gpu will encounter RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu, which has not been resolved yet.

little51 avatar May 31 '23 12:05 little51

@zl1994 Multi gpu will encounter RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu, which has not been resolved yet.

Yeah, I have encountered the same problem and am still working on it. If you have any good news, please let me know😂. Thank you

zl1994 avatar Jun 01 '23 03:06 zl1994

If you have multiple GPUs, update fastchat/train/train_lora.py and using --num_gpus parameter,such as : CUDA_VISIBLE_DEVICES=0,1
deepspeed --num_gpus=2 fastchat/train/train_lora.py \ --deepspeed ./deepspeed-config.json \ --lora_r 8 \ ... ... https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/fastchat/train/train_lora.py and https://github.com/git-cloner/llama-lora-fine-tuning#341-fine-tuning-command

little51 avatar Jun 02 '23 13:06 little51

So, in a single NVIDIA 3090, Fastchat doesn't support finetuning with lora on vicuna-7B model, right?

Dandelionym avatar Jun 15 '23 05:06 Dandelionym

Not LoRa reason, it's because of flash_attn problem, you need to test on the 3090 to see if flash_attn is getting an error

little51 avatar Jun 16 '23 08:06 little51

Here is a validated implementation for finetuning vicuna-7b on a single 3090 gpu or multiple gpus: https://github.com/chengzl18/vicuna-lora

chengzl18 avatar Jul 02 '23 22:07 chengzl18