qlora-chinese-LLM
qlora-chinese-LLM copied to clipboard
使用qlora对中文大语言模型进行微调,包含ChatGLM、Chinese-LLaMA-Alpaca、BELLE
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1....
问题
不知道大佬有没有遇到ValueError: paged_adamw_32bit is not a valid OptimizerNames这个错误
麻烦问下,如何把chatgml和lora合并到一个模型
我训练的时候占比高达0.86,target_modules为'dense', 'dense_4h_to_h', 'dense_h_to_4h', 'query_key_value'
(gh_qlora-chinese-LLM) ub2004@ub2004-B85M-A0:~/llm_dev/qlora-chinese-LLM$ python3 qlora.py --model_name="chatglm" --model_name_or_path="/data-ssd-1t/hf_model/chatglm-6b" --trust_remote_code=True --dataset="msra" --source_max_len=128 --target_max_len=64 --do_train --save_total_limit=1 --padding_side="left" --per_device_train_batch_size=8 --do_eval --bits=4 --save_steps=10 --gradient_accumulation_steps=1 --learning_rate=1e-5 --output_dir="./output/chatglm-6b/" --lora_r=8 --lora_alpha=32 ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports,...
尝试在12G卡上训练 python qlora.py --model_name="chinese_alpaca" --model_name_or_path="./model_hub/chinese-alpaca-7b" --trust_remote_code=False --dataset="msra" --source_max_len=128 --target_max_len=64 --do_train --save_total_limit=1 --padding_side="right" --per_device_train_batch_size=8 --do_eval --bits=4 --save_steps=10 --gradient_accumulation_steps=1 --learning_rate=1e-5 --output_dir="./output/alpaca/" --lora_r=8 --lora_alpha=32 出错: File "/mnt/data1ts/llm/training/qlora-chinese-LLM/qlora.py", line 1012, in train() File "/mnt/data1ts/llm/training/qlora-chinese-LLM/qlora.py",...
https://github.com/taishan1994/qlora-chinese-LLM/blob/b9495bfc3c74188054b9d6c1b8fa26ceb8e81b20/chat.py#L118 同样的peft版本,但是会报错。您这边的环境能跑通么? 另外,删去这个参数就可以跑通了
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. 或者报错ValueError: You can't train a model that...