InternLM-XComposer Bug: The dimension is not match when prepare decorder attention mask.

The issue is present in both internlm-xcomposer2-vl-7b and internlm-xcomposer2-7b, where there is a dimension mismatch between attention_mask and combined_attention_mask. The code was directly cloned from GitHub without any modifications.

20240402-173513

There is the error log:

Position interpolate from 24x24 to 16x16
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at internlm/internlm-xcomposer2-7b and are newly initialized: ['vit.vision_tower.vision_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\transformers\generation\utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Traceback (most recent call last):
  File "G:\CODES\Misee\InternLM-XComposer\examples\example_chat.py", line 34, in <module>
    response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\wll/.cache\huggingface\modules\transformers_modules\internlm\internlm-xcomposer2-7b\d7ab428de9dc92ea1df7763bf4723ac76c181da1\modeling_internlm_xcomposer2.py", line 510, in chat
    outputs = self.generate(
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\transformers\generation\utils.py", line 1522, in generate
    return self.greedy_search(
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\transformers\generation\utils.py", line 2339, in greedy_search
    outputs = self(
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\wll/.cache\huggingface\modules\transformers_modules\internlm\internlm-xcomposer2-7b\d7ab428de9dc92ea1df7763bf4723ac76c181da1\modeling_internlm_xcomposer2.py", line 366, in forward
    outputs = self.model(
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\wll\anaconda3\envs\Misee\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\wll/.cache\huggingface\modules\transformers_modules\internlm\internlm-xcomposer2-7b\d7ab428de9dc92ea1df7763bf4723ac76c181da1\modeling_internlm2.py", line 885, in forward
    attention_mask = self._prepare_decoder_attention_mask(
  File "C:\Users\wll/.cache\huggingface\modules\transformers_modules\internlm\internlm-xcomposer2-7b\d7ab428de9dc92ea1df7763bf4723ac76c181da1\modeling_internlm2.py", line 821, in _prepare_decoder_attention_mask
    expanded_attn_mask + combined_attention_mask)
RuntimeError: The size of tensor a (376) must match the size of tensor b (375) at non-singleton dimension 3