Marcovaldon
Marcovaldon
I tried to add an extra padding operator before concatenating, then there is another error: > Traceback (most recent call last): File "finetune.py", line 312, in train() File "finetune.py", line...
@myownskyW7 @yuhangzang I used only one sample to train with batch>1, there is the same error above.
> Can you provide more details on how you add the padding? After https://huggingface.co/internlm/internlm-xcomposer2-vl-7b/blob/main/modeling_internlm_xcomposer2.py#L266, I implement the padding procedure by below: > longest_token_num = max([wrap_embeds_list[j].shape[1] for j in range(len(img_list))]) for...
> You may avoid in-place variable replacement and define other variables instead. 但我删掉自己的改动,用原始代码训练,只有一条数据重复10w次作为数据集,batch>1时仍会有同样的报错信息。
> I mean you may avoid the in-place replacement such as `wrap_embeds_list[i] = torch.cat([wrap_embeds_list[i], pad1], dim=1)`, and define new variables, e.g., `wrap_embeds_list_new[i] = torch.cat([wrap_embeds_list[i], pad1], dim=1)` 试过了还是相同的报错信息。想问一下作者在finetune的时候有组batch吗? 我用最原始的代码做了一下尝试:训练数据为相同的10w case,此时在https://huggingface.co/internlm/internlm-xcomposer2-vl-7b/blob/main/modeling_internlm_xcomposer2.py#L266处组batch时是可以跑过去的(同一条case得到的序列长度肯定是相同的),但在后面做梯度反传的时候还是会有下面的问题: >...