Baichuan2
Baichuan2 copied to clipboard

Published 20 hours ago •

Reame
Issues

使用lora微调baichuan2-7b-base，3*V100（16G），还是OOM？到底需要多少内存

Open lvjianxin opened this issue 1 year ago • 7 comments

如题，使用的默认数据 deepspeed --include=localhost:4,5,7 fine-tune.py
--report_to "none"
--data_path "data/belle_chat_ramdon_10k.json"
--model_name_or_path "/home/admin/baichuan2/baichuan-inc/Baichuan2-7B-Base"
--output_dir "output"
--model_max_length 64
--num_train_epochs 1
--per_device_train_batch_size 1
--gradient_accumulation_steps 1
--save_strategy epoch
--learning_rate 2e-5
--lr_scheduler_type constant
--adam_beta1 0.9
--adam_beta2 0.98
--adam_epsilon 1e-8
--max_grad_norm 1.0
--weight_decay 1e-4
--warmup_ratio 0.0
--logging_steps 1
--gradient_checkpointing True
--deepspeed ds_config.json
--bf16 False
--tf32 False
--use_lora True

Sep 21 '23 06:09 lvjianxin

4张A100(40G)，batch size=1，对Baichuan2-7B-Base进行全量微调，都OOV

Sep 22 '23 02:09 MrRace

按我测试的 7b 的 fine-tune 来看，你这肯定不够。 V100本身不支持 bf16，于是内存加载差不多就是 7*4 = 2 8 多了。微调差不多Double，也就是两倍的量。如果不出意外，最少是 50 几个 G。实际上，我估计单张 A100 80G 显存可以。

Sep 22 '23 09:09 friendmine

官方能输出说一下吗？我用了5张V100（16G），合计80G，用lora还是OOM，到底需要多少内存？之前用baichuan1-base，2张V100就够了啊，怎么回事？？？？？

Sep 24 '23 14:09 lvjianxin

4张A100(40G)，batch size=1，对Baichuan2-7B-Base进行全量微调，都OOV

6张A100(40G)，batch size=2,对Baichuan2-7B-Chat进行全量微调，能跑

Dec 07 '23 03:12 Xu-pixel

@Xu-pixel

Dec 07 '23 03:12 Xu-pixel

如果只有v100，怎么样才能微调 7b的模型呢，用lora，网上说可以将bf16换成float16的，不知道行不行？ @Xu-pixel

Jan 05 '24 08:01 sherlock1987

@sherlock1987 qlora可以，具体看FireFly仓库

Jan 05 '24 08:01 Xu-pixel