unsloth Support Qwen2

We add support of Qwen2 which is important for open-source community. Our repo Firefly has already supported training Qwen2 with Unsloth, more experiment details can be seen in our model card.

We have evaluated the training gain of Qwen1.5-7B, we use QLoRA and Unsloth to train Qwen1.5-7B for 20 steps on a single V100. The result can be listed as follows. Unsloth can reduce GPU memory by 39.13% and training time by 32.12%, and the training speed can increase by 47.32%.

max_seq_length	per_device_train_batch_size	gradient_accumulation_steps	use_unsloth	rank	GPU	Time
1024	1	16	false	8	13.72GB	448s
1024	1	16	true	8	8.43GB(-38.56%)	308s(-31.25%)
1024	1	16	false	64	16.01GB	452s
1024	1	16	true	64	11.07GB(-30.86%)	311s(-31.19%)
2048	1	16	false	64	18.55GB	840s
2048	1	16	true	64	12.99GB(-29.97%)	596s(-29.05%)
1024	4	4	false	64	24.70GB	357s
1024	4	4	true	64	14.36GB(-41.86%)	253s(-29.13%)
2048	4	4	false	64	32.51GB	741s
2048	4	4	true	64	19.79GB(-39.13%)	503s(-32.12%)

We also evaluate our sft and dpo models with Unsloth on Open LLM Leaderboard, they achieve good performance and outperform the official Qwen1.5-7B-Chat.

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
firefly-gemma-7b	62.93	62.12	79.77	61.57	49.41	75.45	49.28
firefly-qwen1.5-en-7b-dpo-v0.1-unsloth	62.65	56.14	75.5	60.87	58.09	70.72	54.59
zephyr-7b-beta	61.95	62.03	84.36	61.07	57.45	77.74	29.04
firefly-qwen1.5-en-7b-unsloth	61.81	54.27	76.22	61.55	50.62	70.48	57.7
vicuna-13b-v1.5	55.41	57.08	81.24	56.67	51.51	74.66	11.3
Xwin-LM-13B-V0.1	55.29	62.54	82.8	56.53	45.96	74.27	9.63
Qwen1.5-7B-Chat	55.15	55.89	78.56	61.65	53.54	67.72	13.57
gemma-7b-it	53.56	51.45	71.96	53.52	47.29	67.96	29.19

May 05 '24 16:05 yangjianxin1

@yangjianxin1 Oh wait does Qwen2 not have that weird alternating sliding window & normal attention thingo?

May 05 '24 17:05 danielhanchen

Yes, there is not weird alternating sliding window & normal attention in Qwen2, and its use_sliding_window is false in the config.json. And I have compared the code between Llama and Qwen2 almost line by line, they are very similar.

May 06 '24 06:05 yangjianxin1

Thanks for the PR again! I streamlined Qwen2 to call FastMistralModel (since I think it's an exact replica right?)

May 10 '24 17:05 danielhanchen

Could you please provide a detailed explanation of the specific process of fine-tuning Qwen-1.5B-Chat using Unsloth?I want to fine-tune Qwen1.5-7B myself.

May 16 '24 12:05 NeoFii

unsloth unsloth copied to clipboard

Support Qwen2

unsloth
unsloth copied to clipboard