[Bug] Lora微调后模型加载失败

Open Littlew69 opened this issue 9 months ago • 0 comments

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

运行sh shell/internvl2.5/2nd_finetune/internvl2_5_4b_dynamic_res_2nd_finetune_lora.sh对模型微调，不能直接按照 path = './InternVL2_5-4B-lora' model = AutoModel.from_pretrained( path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True).eval().cuda() tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False) 加载模型，报错NotImplementedError: Cannot copy out of meta tensor; no data!。我按照Enhancing InternVL2 on COCO Caption Using LoRA Fine-Tuning中的内容将Lora参数合并，也有同样的报错。

我在模型合并中，加载模型的InternVLChatModel.from_pretrained参数中添加device_map="auto"可以merge，但是推理速度远低于未微调的模型

Reproduction

set -x

GPUS=${GPUS:-4} BATCH_SIZE=${BATCH_SIZE:-4} PER_DEVICE_BATCH_SIZE=${PER_DEVICE_BATCH_SIZE:-1} GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / GPUS))

export PYTHONPATH="${PYTHONPATH}:$(pwd)" export MASTER_PORT=34229 export TF_CPP_MIN_LOG_LEVEL=3 export LAUNCHER=pytorch

OUTPUT_DIR='./internvl_checkpoints/internvl_chat_v2_5/internvl2_5_4b_dynamic_res_2nd_finetune_lora'

if [ ! -d "$OUTPUT_DIR" ]; then mkdir -p "$OUTPUT_DIR" fi

number of gpus: 2

batch size per gpu: 4

gradient accumulation steps: 2

total batch size: 16

epoch: 1

torchrun
--nnodes=1
--node_rank=0
--master_addr=127.0.0.1
--nproc_per_node=${GPUS}
--master_port=${MASTER_PORT}
internvl/train/internvl_chat_finetune.py
--model_name_or_path "OpenGVLab/InternVL2_5-4B"
--conv_style "internvl2_5"
--use_fast_tokenizer False
--output_dir ${OUTPUT_DIR}
--meta_path "./InternVL_process/finetune_lora_json/ng-4-1-ok-4-1-bbox.json"
--overwrite_output_dir True
--force_image_size 448
--max_dynamic_patch 6
--down_sample_ratio 0.5
--drop_path_rate 0.0
--freeze_llm True
--freeze_mlp True
--freeze_backbone True
--use_llm_lora 16
--vision_select_layer -1
--dataloader_num_workers 4
--bf16 True
--num_train_epochs 1
--per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE}
--gradient_accumulation_steps ${GRADIENT_ACC}
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 200
--save_total_limit 1
--learning_rate 4e-5
--weight_decay 0.01
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--max_seq_length 8192
--do_train True
--grad_checkpoint True
--group_by_length True
--dynamic_image_size True
--use_thumbnail True
--ps_version 'v2'
--deepspeed "zero_stage1_config.json"
--report_to "tensorboard"
2>&1 | tee -a "${OUTPUT_DIR}/training_log.txt"

Environment

使用文档中的Python==3.9的环境

Error traceback

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "~/InternVL_process/inference_with_transformers.py", line 160, in <module>
    model = AutoModel.from_pretrained(
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2567, in cuda
    return super().cuda(*args, **kwargs)
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 918, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 918, in <lambda>
    return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!

Mar 26 '25 09:03 Littlew69