IndexError: index 0 is out of bounds for dimension 0 with size 0 for " if cache_position is None or (cache_position is not None and cache_position[0] == 0):"
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 38/38 [00:33<00:00, 1.14it/s]
tensor([], device='cuda:0', dtype=torch.int64)
Traceback (most recent call last):
File "/home/xzy/xjy/qwen/test.py", line 55, in
I met the same problem as you, have you solved it?
Sry, I have not solved.
+1
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [04:06<00:00, 6.48s/it]
Traceback (most recent call last):
File "/picassox/sfs-mtlab-train-base/segmentation/lzj7/qwen_caption.py", line 46, in
same issue
I also occur this error. When i used the qwen2-vl-instruct to sft model, then i used the model inference is right, but i used the qwen2-vl-base to sft model, then i used the model inference is wrong.
Did you solve this problem?
I found this problem is from inputs, when i printed the inputs, the text token is empty tensor.
inputs = self.processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", )
Has anyone solved this problem?
maybe you download the wrong model, check you download "Qwen2-VL-*B-Instruct" or "Qwen/Qwen2-VL-*B"
发现在fsdp训练下会有这个问题,加上CUDA_LAUNCH_BLOCKING=1 之后会发现报错其实是在上一行
input_ids = input_ids[:, cache_position]
看到Qwen2VLForConditionalGeneration是继承了GenerationMixin,对比二者prepare_inputs_for_generation实现区别
- Qwen2VLForConditionalGeneration#prepare_inputs_for_generation
- GenerationMixin#prepare_inputs_for_generation
GenerationMixin里多了一段 Exception 3的描述和对应的条件
# 2. Generic cache-dependent input preparation
# If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
# Exception 1: when passing input_embeds, input_ids may be missing entries
# Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
# Exception 3: with synced GPUs cache_position may go out of bounds, but we only want dummy token in that case.
# (we can't check exception 3 while compiling)
...
if (
inputs_embeds is not None # Exception 1
or (is_torchdynamo_compiling() or cache_position[-1] >= input_ids.shape[1]) # Exception 3
):
input_ids = input_ids[:, -cache_position.shape[0] :]
把这个条件补全之后能正常推理
依赖版本
transformers 4.47.1
@steermomo Can you be more specific? I'm a rookie.
I encountered the same problem. The root cause is that the chat_template of Qwen2-VL and Qwen2-VL-Instruct is different, which causes the input_ids to be empty before inference. So the solution is to replace the chat_template.json of Qwen2-VL with the chat_template.json of Instruct. It solved my problem.
I encountered the same problem. The root cause is that the chat_template of Qwen2-VL and Qwen2-VL-Instruct is different, which causes the input_ids to be empty before inference. So the solution is to replace the chat_template.json of Qwen2-VL with the chat_template.json of Instruct. It solved my problem.
Thanks! It works for me.
I encountered the same problem. The root cause is that the chat_template of Qwen2-VL and Qwen2-VL-Instruct is different, which causes the input_ids to be empty before inference. So the solution is to replace the chat_template.json of Qwen2-VL with the chat_template.json of Instruct. It solved my problem.
Thanks! It works for me.
你是做什么任务的,可以交流一下吗?
Thank you, can some one please fix this? I almost ditched this model.
I encountered the same problem. The root cause is that the chat_template of Qwen2-VL and Qwen2-VL-Instruct is different, which causes the input_ids to be empty before inference. So the solution is to replace the chat_template.json of Qwen2-VL with the chat_template.json of Instruct. It solved my problem.
where is this file stored? can you share steps to fix the problem please?
感谢!确实是chat_template.json的原因!