是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

自己构建数据对模型进行lora微调，但训练到某一步会出现错误

 File "/xxx/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat/modeling_qwen.py", line 557, in forward
    assert (bos_pos[0] == eos_pos[0]).all()
RuntimeError: The size of tensor a (11) must match the size of tensor b (10) at non-singleton dimension 0
^M 24%|██▎       | 36/152 [22:57<1:13:57, 38.25s/it]```

### 期望行为 | Expected Behavior

能够完成训练，保存模型

### 复现方法 | Steps To Reproduce

环境按照requirements进行安装。
脚本如下：

#!/bin/bash export CUDA_DEVICE_MAX_CONNECTIONS=1 DIR=pwd

MODEL="../../LLMs/Qwen/Qwen-VL-Chat" #"Qwen/Qwen-VL-Chat"/"Qwen/Qwen-VL" # Set the path if you do not want to load from huggingface directly

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

#DATA="data/debate/v4/filter_v4_3000_a2r3_CoD.json" #SAVE_DIT="checkpoints/lora/CoD" DATA=$1 SAVE_DIT=$2

export CUDA_VISIBLE_DEVICES=0

python finetune.py
--model_name_or_path $MODEL
--data_path $DATA
--bf16 True
--fix_vit True
--output_dir $SAVE_DIT
--num_train_epochs 2
--per_device_train_batch_size 4
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 100
--save_total_limit 10
--learning_rate 1e-5
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "none"
--model_max_length 2048
--lazy_preprocess True
--gradient_checkpointing
--use_lora


### 运行环境 | Environment

```Markdown
- OS:Ubuntu 
- Python:3.9
- Transformers:4.32.0
- PyTorch:2.2.0+cu121
- CUDA 12.2

备注 | Anything else?

No response

Feb 06 '24 04:02 Luoyang144

图片放在prompt后面被截断了吧，模型只能看到图片开始的special token，看不到结束的，数量不一致，所以报错了

Mar 13 '24 09:03 buaacoder

我也遇到了一样的问题，但把每轮对话的图片数量减少（从三十多张减到了六张）就不会发生了，会是图片数量限制的问题吗？

Aug 26 '24 12:08 L1NINE

Qwen-VL
Qwen-VL copied to clipboard

[BUG] Lora微调出错

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

备注 | Anything else?

Qwen-VL Qwen-VL copied to clipboard

[BUG] Lora微调出错

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

备注 | Anything else?

Qwen-VL
Qwen-VL copied to clipboard