Llama-Chinese
Llama-Chinese copied to clipboard
lora微调这个怎么解决啊?
[2024-06-05 14:39:53,109] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:53,937] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-06-05 14:39:53,937] [INFO] [runner.py:568:main] cmd = /root/anaconda3/envs/kh/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_clm_lora.py --model_name_or_path /AI2024/kh/Models/Meta-Llama-3-8B-Instruct --train_files ../../data/train_sft.csv --validation_files ../../data/dev_sft.csv ../../data/dev_sft_sharegpt.csv --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --use_fast_tokenizer false --output_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model --evaluation_strategy steps --max_eval_samples 800 --learning_rate 1e-4 --gradient_accumulation_steps 8 --num_train_epochs 10 --warmup_steps 400 --load_in_bits 4 --lora_r 8 --lora_alpha 32 --target_modules q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj --logging_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs --logging_strategy steps --logging_steps 10 --save_strategy steps --preprocessing_num_workers 10 --save_steps 20 --eval_steps 20 --save_total_limit 2000 --seed 42 --disable_tqdm false --ddp_find_unused_parameters false --block_size 2048 --report_to tensorboard --overwrite_output_dir --deepspeed ds_config_zero2.json --ignore_data_skip true --bf16 --gradient_checkpointing --bf16_full_eval --ddp_timeout 18000000
[2024-06-05 14:39:55,793] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-05 14:39:56,609] [INFO] [launch.py:139:main] 0 NCCL_P2P_DISABLE=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0]}
[2024-06-05 14:39:56,609] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-06-05 14:39:56,609] [INFO] [launch.py:164:main] dist_world_size=1
[2024-06-05 14:39:56,609] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-06-05 14:39:56,609] [INFO] [launch.py:256:main] process 87440 spawned with command: ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000']
[2024-06-05 14:39:58,562] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
Traceback (most recent call last):
File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1535, in _get_module
return importlib.import_module("." + module_name, self.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/kh/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/AI2024/kh/Finetune/Llama-Chinese/train/sft/finetune_clm_lora.py", line 48, in