LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

[FEATURE: ADD LISA ALGORITHM]

Open qibaoyuan opened this issue 10 months ago • 10 comments

What does this PR do?

NEW FEATURE: ADD LISA ALGORITHM, SEE: https://arxiv.org/abs/2403.17919

Before submitting

qibaoyuan avatar Apr 02 '24 10:04 qibaoyuan

fixes: https://github.com/hiyouga/LLaMA-Factory/issues/3087

hiyouga avatar Apr 02 '24 14:04 hiyouga

Takes https://github.com/OptimalScale/LMFlow/issues/726

hiyouga avatar Apr 03 '24 06:04 hiyouga

When combining lisa with multiple GPUs, Zero3 and gradient checkpointing, it comes to the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)

return torch._C._nn.flatten_dense_tensors(tensors) single_grad_partition = self.flatten(self.averaged_gradients[sub_group_id]).to...

yetionyo avatar Apr 05 '24 14:04 yetionyo

I came up with the code below. The id of optimizer changes when call on_train_epoch_start. BAD THINGS: Still it needs lightning package installed and can only be preformed in a separate project/python file , something like this link: https://lightning.ai/lightning-ai/studios/code-lora-from-scratch .Further updates will be reported.


def on_train_epoch_start(self, trainer: "L.Trainer", pl_module: "pl.LightningModule"):
    if trainer.current_epoch % self.epoch_interval == 0:
        self.switch_active_layers()
        pl_module.optimizer_fn = torch.optim.Adam
        trainer.strategy.setup_optimizers(trainer)

qibaoyuan avatar Apr 07 '24 06:04 qibaoyuan

截屏2024-04-11 15 44 07

I have conducted experiments on llama2-7b using full, lisa_2, lisa_32 methods. From the image above, you can see that the train loss curve decreases and full is the same as lisa_32.

The latest code borrowed some impl from lmflow and axolotl. Some impl details are purified and debug option is given.

Hope this will be merged.

qibaoyuan avatar Apr 11 '24 09:04 qibaoyuan

I tried this and noticed that fine-tuning Qwen/Qwen1.5-0.5B consumes more than 18 GB VRAM with following config. Is this expected?

Config
#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path Qwen/Qwen1.5-0.5B \
    --dataset mhqg_1k \
    --dataset_dir ../../data \
    --template default \
    --finetuning_type full \
    --use_lisa \
    --lisa_activated_layers 2 \
    --lisa_interval_steps 5 \
    --output_dir ../../saves/Qwen1.5-0.5B/lisa/sft \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 3192 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --warmup_steps 20 \
    --save_steps 100 \
    --eval_steps 100 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --max_samples 3000 \
    --val_size 0.1 \
    --plot_loss \
    --fp16
System Info
$ uname -a
Linux 6bf7eb606868 5.4.0-152-generic #169-Ubuntu SMP Tue Jun 6 22:23:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5000               Off | 00000000:81:00.0 Off |                  Off |
| 30%   32C    P0              56W / 230W |      1MiB / 24564MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

neteroster avatar Apr 12 '24 05:04 neteroster

When I used LISA to fintune Llama-2-7b on alpaca-gpt4-en with one a100 80G,the used memory increased sharply and exceeded 80G. I want to know how to solve this problem...

Config: CUDA_VISIBLE_DEVICES=2 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path meta-llama/Llama-2-7b-chat-hf
--dataset alpaca_gpt4_en
--dataset_dir data
--template default
--finetuning_type full
--use_lisa 1
--lisa_verbose 1
--lisa_activated_layers 2
--lisa_interval_steps 3
--output_dir saves/Llama-2-7b-chat-lisa-2-3
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 5
--warmup_steps 0
--save_steps 30000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16

Error: image

GPU info when running image

lovekdl avatar Apr 19 '24 16:04 lovekdl

@neteroster Hello. I have the same problem with you, have you solved it?

lovekdl avatar Apr 23 '24 09:04 lovekdl

@lovekdl Not yet.

neteroster avatar Apr 23 '24 15:04 neteroster