LLaMA-Factory
LLaMA-Factory copied to clipboard
Qwen1.5-MOE-A2.7B训练问题
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
#!/bin/bash
export USE_MODELSCOPE_HUB=1
deepspeed --num_gpus 1 ../../src/train_bash.py
--deepspeed ../deepspeed/ds_z3_config.json
--stage sft
--do_train
--model_name_or_path Qwen/Qwen1.5-MoE-A2.7B-Chat
--dataset our_alpaca
--dataset_dir ../../data
--template default
--finetuning_type full
--output_dir ../../output/case1
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--preprocessing_num_workers 16
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--warmup_steps 20
--save_steps 2000
--eval_steps 100
--evaluation_strategy steps
--learning_rate 5e-5
--num_train_epochs 6.0
--max_samples 3000
--val_size 0.1
--ddp_timeout 180000000
--plot_loss
--fp16 True
Expected behavior
我的transformers已经是4.39.3版本了
System Info
ValueError: The checkpoint you are trying to load has model type qwen2_moe
but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
Others
No response
你需要使用预览版的transformers
请使用指令pip install git+https://github.com/huggingface/transformers