LLM-Adapters icon indicating copy to clipboard operation
LLM-Adapters copied to clipboard

Couldn't get the same accuracy with eight commonsense reasoning datasets.

Open ello0211 opened this issue 2 years ago • 7 comments

Hi,thanks for your great work! When I try to reproduce the results with commonssense reasoning datasets, it turns out to be not good as the table. The set I use is the same as the math resoning tasks showen in the readme.could you tell me if I use the right set or could you show me the right way to reproduce the same accuracy as the table. Thank you so much!

ello0211 avatar Sep 13 '23 01:09 ello0211

Hi,

The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

HZQ950419 avatar Sep 13 '23 06:09 HZQ950419

ok!I will try these later, thanks a lot

ello0211 avatar Sep 13 '23 07:09 ello0211

@ello0211 Hi, did you manage to get the same result as the table reported? Thx!

nbasyl avatar Nov 17 '23 06:11 nbasyl

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

ello0211 avatar Nov 27 '23 10:11 ello0211

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

Hi, with r=32, the number of LoRA parameters should be 8 times of r=4.

HZQ950419 avatar Dec 07 '23 05:12 HZQ950419

Hi,

The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64 

And evaluated by this script:

CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path 

But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !

ls559 avatar Dec 18 '23 06:12 ls559

Hi, The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64 For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]' For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64 

And evaluated by this script:

CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path 

But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !

Hi, the command is the same as the one we use. Are you using multi-gpu for fine-tuning? Maybe you can try to use single GPU for fine-tuning, as there are some other researchers can't reproduce the results with multi-gpu training.

HZQ950419 avatar Jan 09 '24 12:01 HZQ950419