BELLE lora权重文件有14G

基座模型选的BELLE-7B-2M，进行LoRA微调，使用src/train.py torchrun --nproc_per_node 8 train.py
--model_name_or_path ${model_name_or_path}
--deepspeed configs/deepspeed_config_stage3.json
--use_lora True
--lora_config configs/lora_config_bloom.json
--train_file ${train_file}
--validation_file ${validation_file}
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 4
--num_train_epochs 3
--model_max_length ${cutoff_len}
--save_strategy "steps"
--save_total_limit 3
--learning_rate 8e-6
--weight_decay 0.00001
--warmup_ratio 0.05
--lr_scheduler_type "cosine"
--logging_steps 10
--evaluation_strategy "steps"
--fp16 True
--seed 1234
--gradient_checkpointing True
--cache_dir ${cache_dir}
--output_dir ${output_dir_lora} 训练后，在checkpoint-step下保存的pytorch_model.bin有14个G，看上去像是完整模型的权重文件，而不像LoRA权重文件那样只有几十兆。请问这是为什么？

Jun 02 '23 08:06 davidie

V100，zero3出现上述情况，改成zero2后，lora权重变成只有几十兆了

Jun 02 '23 13:06 davidie

同问，而且加载lora模型的权重lora_B权重为0

Aug 16 '23 04:08 hurun