unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Support Qwen2

Open yangjianxin1 opened this issue 1 year ago • 2 comments

We add support of Qwen2 which is important for open-source community. Our repo Firefly has already supported training Qwen2 with Unsloth, more experiment details can be seen in our model card.

We have evaluated the training gain of Qwen1.5-7B, we use QLoRA and Unsloth to train Qwen1.5-7B for 20 steps on a single V100. The result can be listed as follows. Unsloth can reduce GPU memory by 39.13% and training time by 32.12%, and the training speed can increase by 47.32%.

max_seq_length per_device_train_batch_size gradient_accumulation_steps use_unsloth rank GPU Time
1024 1 16 false 8 13.72GB 448s
1024 1 16 true 8 8.43GB(-38.56%) 308s(-31.25%)
1024 1 16 false 64 16.01GB 452s
1024 1 16 true 64 11.07GB(-30.86%) 311s(-31.19%)
2048 1 16 false 64 18.55GB 840s
2048 1 16 true 64 12.99GB(-29.97%) 596s(-29.05%)
1024 4 4 false 64 24.70GB 357s
1024 4 4 true 64 14.36GB(-41.86%) 253s(-29.13%)
2048 4 4 false 64 32.51GB 741s
2048 4 4 true 64 19.79GB(-39.13%) 503s(-32.12%)

We also evaluate our sft and dpo models with Unsloth on Open LLM Leaderboard, they achieve good performance and outperform the official Qwen1.5-7B-Chat.

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
firefly-gemma-7b 62.93 62.12 79.77 61.57 49.41 75.45 49.28
firefly-qwen1.5-en-7b-dpo-v0.1-unsloth 62.65 56.14 75.5 60.87 58.09 70.72 54.59
zephyr-7b-beta 61.95 62.03 84.36 61.07 57.45 77.74 29.04
firefly-qwen1.5-en-7b-unsloth 61.81 54.27 76.22 61.55 50.62 70.48 57.7
vicuna-13b-v1.5 55.41 57.08 81.24 56.67 51.51 74.66 11.3
Xwin-LM-13B-V0.1 55.29 62.54 82.8 56.53 45.96 74.27 9.63
Qwen1.5-7B-Chat 55.15 55.89 78.56 61.65 53.54 67.72 13.57
gemma-7b-it 53.56 51.45 71.96 53.52 47.29 67.96 29.19

yangjianxin1 avatar May 05 '24 16:05 yangjianxin1

@yangjianxin1 Oh wait does Qwen2 not have that weird alternating sliding window & normal attention thingo?

danielhanchen avatar May 05 '24 17:05 danielhanchen

Yes, there is not weird alternating sliding window & normal attention in Qwen2, and its use_sliding_window is false in the config.json. And I have compared the code between Llama and Qwen2 almost line by line, they are very similar.

yangjianxin1 avatar May 06 '24 06:05 yangjianxin1

Thanks for the PR again! I streamlined Qwen2 to call FastMistralModel (since I think it's an exact replica right?)

danielhanchen avatar May 10 '24 17:05 danielhanchen

Could you please provide a detailed explanation of the specific process of fine-tuning Qwen-1.5B-Chat using Unsloth?I want to fine-tune Qwen1.5-7B myself.

NeoFii avatar May 16 '24 12:05 NeoFii