MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

[BUG] <title>模型文件中resampler.py的代码错误

Open kevin236-max opened this issue 3 months ago • 2 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • [x] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

    def batch_attn_forward(self, q, k, v, pos_embed_temporal, temporal_ids, key_padding_mask):
        bs = k.shape[0]  # bs = k.shape[1]
        if pos_embed_temporal:
            k += torch.stack(pos_embed_temporal, dim=0)
            bs = len(temporal_ids)
            merge_k = []
            merge_v = []
            merge_key_padding_mask = []

            start = 0
            for tp in temporal_ids:
                end = start + len(tp)
                # # L * (end-start) * D -> (end-start) * L * D -> 1 * L*(end-start) * D
                merge_k.append(k[:, start: end, :].permute(1, 0, 2).reshape(-1, self.embed_dim))
                merge_v.append(v[:, start: end, :].permute(1, 0, 2).reshape(-1, self.embed_dim))
                merge_key_padding_mask.append(key_padding_mask[start: end, :].reshape(-1, 1))

                start = end
                            
            k = torch.nn.utils.rnn.pad_sequence(merge_k, batch_first=True, padding_value=0.0).permute(1, 0, 2)  # L*(end-start)
            v = torch.nn.utils.rnn.pad_sequence(merge_v, batch_first=True, padding_value=0.0).permute(1, 0, 2)  # L*(end-start)
            key_padding_mask = torch.nn.utils.rnn.pad_sequence(merge_key_padding_mask, batch_first=True, padding_value=True).squeeze(-1)

        out = self.attn(
            self._repeat(q, bs),  # Q * B * D
            k,  # L * B * D +  L * B * D
            v,
            key_padding_mask=key_padding_mask)[0]

        return out

期望行为 | Expected Behavior

这里第一行代码bs应该是k.shape[1],如果外部没有传入temporal_ids,bs就会出现错误,被设置成图像token的长度

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

kevin236-max avatar Sep 17 '25 13:09 kevin236-max

huggingface仓库的代码已经修复了这个文图

wzr0108 avatar Sep 18 '25 05:09 wzr0108

你好,可以更新一下 huggingface 仓库的最新代码

YuzaChongyi avatar Sep 18 '25 05:09 YuzaChongyi