Llama-Chinese icon indicating copy to clipboard operation
Llama-Chinese copied to clipboard

lora微调这个怎么解决啊?

Open kuang1216 opened this issue 1 year ago • 0 comments

[2024-06-05 14:39:53,109] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible [2024-06-05 14:39:53,937] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2024-06-05 14:39:53,937] [INFO] [runner.py:568:main] cmd = /root/anaconda3/envs/kh/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_clm_lora.py --model_name_or_path /AI2024/kh/Models/Meta-Llama-3-8B-Instruct --train_files ../../data/train_sft.csv --validation_files ../../data/dev_sft.csv ../../data/dev_sft_sharegpt.csv --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --use_fast_tokenizer false --output_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model --evaluation_strategy steps --max_eval_samples 800 --learning_rate 1e-4 --gradient_accumulation_steps 8 --num_train_epochs 10 --warmup_steps 400 --load_in_bits 4 --lora_r 8 --lora_alpha 32 --target_modules q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj --logging_dir /AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs --logging_strategy steps --logging_steps 10 --save_strategy steps --preprocessing_num_workers 10 --save_steps 20 --eval_steps 20 --save_total_limit 2000 --seed 42 --disable_tqdm false --ddp_find_unused_parameters false --block_size 2048 --report_to tensorboard --overwrite_output_dir --deepspeed ds_config_zero2.json --ignore_data_skip true --bf16 --gradient_checkpointing --bf16_full_eval --ddp_timeout 18000000 [2024-06-05 14:39:55,793] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible [2024-06-05 14:39:56,609] [INFO] [launch.py:139:main] 0 NCCL_P2P_DISABLE=1 [2024-06-05 14:39:56,609] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0]} [2024-06-05 14:39:56,609] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=1, node_rank=0 [2024-06-05 14:39:56,609] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2024-06-05 14:39:56,609] [INFO] [launch.py:164:main] dist_world_size=1 [2024-06-05 14:39:56,609] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0 [2024-06-05 14:39:56,609] [INFO] [launch.py:256:main] process 87440 spawned with command: ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000'] [2024-06-05 14:39:58,562] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible Traceback (most recent call last): File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1535, in _get_module return importlib.import_module("." + module_name, self.name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/kh/lib/python3.11/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 1204, in _gcd_import File "", line 1176, in _find_and_load File "", line 1147, in _find_and_load_unlocked File "", line 690, in _load_unlocked File "", line 940, in exec_module File "", line 241, in _call_with_frames_removed File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/trainer.py", line 180, in from apex import amp File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/AI2024/kh/Finetune/Llama-Chinese/train/sft/finetune_clm_lora.py", line 48, in from transformers import ( File "", line 1229, in _handle_fromlist File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1525, in getattr module = self._get_module(self._class_to_module[name]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/kh/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1537, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback): cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location) [2024-06-05 14:40:01,613] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 87440 [2024-06-05 14:40:01,614] [ERROR] [launch.py:325:sigkill_handler] ['/root/anaconda3/envs/kh/bin/python', '-u', 'finetune_clm_lora.py', '--local_rank=0', '--model_name_or_path', '/AI2024/kh/Models/Meta-Llama-3-8B-Instruct', '--train_files', '../../data/train_sft.csv', '--validation_files', '../../data/dev_sft.csv', '../../data/dev_sft_sharegpt.csv', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--do_train', '--do_eval', '--use_fast_tokenizer', 'false', '--output_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model', '--evaluation_strategy', 'steps', '--max_eval_samples', '800', '--learning_rate', '1e-4', '--gradient_accumulation_steps', '8', '--num_train_epochs', '10', '--warmup_steps', '400', '--load_in_bits', '4', '--lora_r', '8', '--lora_alpha', '32', '--target_modules', 'q_proj,k_proj,v_proj,o_proj,down_proj,gate_proj,up_proj', '--logging_dir', '/AI2024/kh/Finetune/Llama-Chinese/finetune_model/logs', '--logging_strategy', 'steps', '--logging_steps', '10', '--save_strategy', 'steps', '--preprocessing_num_workers', '10', '--save_steps', '20', '--eval_steps', '20', '--save_total_limit', '2000', '--seed', '42', '--disable_tqdm', 'false', '--ddp_find_unused_parameters', 'false', '--block_size', '2048', '--report_to', 'tensorboard', '--overwrite_output_dir', '--deepspeed', 'ds_config_zero2.json', '--ignore_data_skip', 'true', '--bf16', '--gradient_checkpointing', '--bf16_full_eval', '--ddp_timeout', '18000000'] exits with return code = 1

kuang1216 avatar Jun 05 '24 06:06 kuang1216