FastChat
FastChat copied to clipboard
How to run train_lora.py
Hi Team,
Can someone please provide the command line instruction to execute train_lora.py file as the readme only contains the command for train_mem.py
Thank you
Me too! Also ask how to train lora for Vicuna.
Please check here(mention that the given configuration may not work, and it's just an example of the use case)
Have you managed to get the lora running ?
I have not run it successfully, because even I used the lora train, there is still a Out-of-memery error in my GPU (3090-24G) when I fine tune 7B base model. So, how much memory of GPU are required when using lora train? And I am also interested in why I could easily run lora train for alpaca-lora when I fine tune the same 7B base model. Is FastChat more heavy?
For oom, please try to add these lines from Alpaca-Lora. I'll add a PR if that works.
I have tried this, it does train with alpaca-lora, but when I try inference - it produce result - like it was not trained.
The flash_attn is not supported.Use load_in_8bit,peft technology and bitsandbytes to accelerate.It requires about 13G of GPU memory.
https://github.com/git-cloner/llama-lora-fine-tuning#341-fine-tuning-command for the training script
train_lora.py needs to be modified, refer to:
https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/fastchat/train/train_lora.py
and
https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/deepspeed-config.json
@little51 How to support multi-GPU training?
The flash_attn is not supported.Use load_in_8bit,peft technology and bitsandbytes to accelerate.It requires about 13G of GPU memory.
https://github.com/git-cloner/llama-lora-fine-tuning#341-fine-tuning-command for the training script
train_lora.py needs to be modified, refer to: https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/fastchat/train/train_lora.py and https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/deepspeed-config.json
@zl1994 Multi gpu will encounter RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu, which has not been resolved yet.
@zl1994 Multi gpu will encounter RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu, which has not been resolved yet.
Yeah, I have encountered the same problem and am still working on it. If you have any good news, please let me know😂. Thank you
If you have multiple GPUs, update fastchat/train/train_lora.py and using --num_gpus parameter,such as :
CUDA_VISIBLE_DEVICES=0,1
deepspeed --num_gpus=2 fastchat/train/train_lora.py \
--deepspeed ./deepspeed-config.json \
--lora_r 8 \
... ...
https://github.com/git-cloner/llama-lora-fine-tuning/blob/main/fastchat/train/train_lora.py
and
https://github.com/git-cloner/llama-lora-fine-tuning#341-fine-tuning-command
So, in a single NVIDIA 3090, Fastchat doesn't support finetuning with lora on vicuna-7B model, right?
Not LoRa reason, it's because of flash_attn problem, you need to test on the 3090 to see if flash_attn is getting an error
Here is a validated implementation for finetuning vicuna-7b on a single 3090 gpu or multiple gpus: https://github.com/chengzl18/vicuna-lora