LLaMA-Factory
LLaMA-Factory copied to clipboard
[FEATURE: ADD LISA ALGORITHM]
What does this PR do?
NEW FEATURE: ADD LISA ALGORITHM, SEE: https://arxiv.org/abs/2403.17919
Before submitting
- [x] Did you read the contributor guideline?
fixes: https://github.com/hiyouga/LLaMA-Factory/issues/3087
Takes https://github.com/OptimalScale/LMFlow/issues/726
When combining lisa with multiple GPUs, Zero3 and gradient checkpointing, it comes to the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)
return torch._C._nn.flatten_dense_tensors(tensors) single_grad_partition = self.flatten(self.averaged_gradients[sub_group_id]).to...
I came up with the code below. The id
of optimizer
changes when call on_train_epoch_start
. BAD THINGS: Still it needs lightning
package installed and can only be preformed in a separate project/python file , something like this link: https://lightning.ai/lightning-ai/studios/code-lora-from-scratch .Further updates will be reported.
def on_train_epoch_start(self, trainer: "L.Trainer", pl_module: "pl.LightningModule"):
if trainer.current_epoch % self.epoch_interval == 0:
self.switch_active_layers()
pl_module.optimizer_fn = torch.optim.Adam
trainer.strategy.setup_optimizers(trainer)
I have conducted experiments on llama2-7b using full, lisa_2, lisa_32 methods. From the image above, you can see that the train loss curve decreases and full is the same as lisa_32.
The latest code borrowed some impl from lmflow and axolotl. Some impl details are purified and debug option is given.
Hope this will be merged.
I tried this and noticed that fine-tuning Qwen/Qwen1.5-0.5B
consumes more than 18 GB VRAM with following config. Is this expected?
Config
#!/bin/bash
CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \
--stage sft \
--do_train \
--model_name_or_path Qwen/Qwen1.5-0.5B \
--dataset mhqg_1k \
--dataset_dir ../../data \
--template default \
--finetuning_type full \
--use_lisa \
--lisa_activated_layers 2 \
--lisa_interval_steps 5 \
--output_dir ../../saves/Qwen1.5-0.5B/lisa/sft \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 3192 \
--preprocessing_num_workers 16 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 100 \
--evaluation_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 3.0 \
--max_samples 3000 \
--val_size 0.1 \
--plot_loss \
--fp16
System Info
$ uname -a
Linux 6bf7eb606868 5.4.0-152-generic #169-Ubuntu SMP Tue Jun 6 22:23:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A5000 Off | 00000000:81:00.0 Off | Off |
| 30% 32C P0 56W / 230W | 1MiB / 24564MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
When I used LISA to fintune Llama-2-7b on alpaca-gpt4-en with one a100 80G,the used memory increased sharply and exceeded 80G. I want to know how to solve this problem...
Config:
CUDA_VISIBLE_DEVICES=2 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path meta-llama/Llama-2-7b-chat-hf
--dataset alpaca_gpt4_en
--dataset_dir data
--template default
--finetuning_type full
--use_lisa 1
--lisa_verbose 1
--lisa_activated_layers 2
--lisa_interval_steps 3
--output_dir saves/Llama-2-7b-chat-lisa-2-3
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 5
--warmup_steps 0
--save_steps 30000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
Error:
GPU info when running
@neteroster Hello. I have the same problem with you, have you solved it?
@lovekdl Not yet.