ms-swift
ms-swift copied to clipboard
win10训练qwen1.5-moe-A2.7B-chat-gptq-int4速度缓慢
Describe the bug 在powershell中运行下列命令
D:\github\ENV\qwen\Scripts\python.exe d:\github\swift\swift\cli\sft.py `
--model_type qwen1half-moe-a2_7b-chat-int4 `
--model_id_or_path "D:\models\Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4" `
--sft_type lora `
--dtype AUTO `
--output_dir "D:\github\swift\output" `
--train_dataset_sample -1 `
--num_train_epochs 3 `
--max_length 1024 `
--check_dataset_strategy warning `
--lora_rank 8 `
--lora_alpha 32 `
--lora_dropout_p 0.05 `
--lora_target_modules ALL `
--gradient_checkpointing true `
--batch_size 1 `
--weight_decay 0.1 `
--learning_rate 2e-5 `
--gradient_accumulation_steps 16 `
--max_grad_norm 1.0 `
--warmup_ratio 0.03 `
--eval_steps 50 `
--save_steps 50 `
--save_total_limit 3 `
--logging_steps 10 `
--use_flash_attn false `
--self_cognition_sample 1000 `
--model_name 风语诗人 'FengPoet' `
--model_author 'Geoffery' `
--custom_train_dataset_path "D:\dataset\poems_processed\rhyme\train_d1.jsonl" `
--custom_val_dataset_path "D:\dataset\poems_processed\rhyme\val_d1.jsonl"
发现训练缓慢需要大概七十多小时。
[INFO:swift] Model file config.json is different from the latest version `master`,This is because you are using an older version or the file is updated manually.
[INFO:swift] The SftArguments will be saved in: D:\github\swift\output\qwen1half-moe-a2_7b-chat-int4\v0-20240426-150438\sft_args.json
[INFO:swift] The Seq2SeqTrainingArguments will be saved in: D:\github\swift\output\qwen1half-moe-a2_7b-chat-int4\v0-20240426-150438\training_args.json
[INFO:swift] The logging file will be saved in: D:\github\swift\output\qwen1half-moe-a2_7b-chat-int4\v0-20240426-150438\logging.jsonl
Train: 0%| | 0/3936 [00:00<?, ?it/s]2024-04-26 15:06:06,967 - modelscope - INFO - PyTorch version 2.3.0+cu118 Found.
2024-04-26 15:06:06,967 - modelscope - INFO - Loading ast index from C:\Users\Administrator\.cache\modelscope\ast_indexer
2024-04-26 15:06:07,013 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 81f3d6fd46847ddcf779e2d1e42341be and a total number of 976 components indexed
D:\github\ENV\qwen\lib\site-packages\transformers\models\qwen2_moe\modeling_qwen2_moe.py:775: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
{'loss': 5.03665495, 'acc': 0.35384867, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.0, 'global_step': 1}
Train: 0%| | 2/3936 [02:21<76:37:20, 70.12s/it]
Your hardware and system info cuda:11.8, win10系统, GPU: RTX A5000, torch version: 2.3.0+cu118
Additional context
训练开始时提示bitsandbytes未安装,于是安装了windows版本b2b python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
,不明白是哪里出了问题,期待回复感谢