InfLLM Error when reproducing mistral results

When I want to reproduce the original model results of mistral-7b-v0.2 without flash-attn I got the error:

Traceback (most recent call last):
  File "/home/yuanye/long_llm/InfLLM/benchmark/pred.py", line 330, in <module>
    preds = get_pred(
  File "/home/yuanye/long_llm/InfLLM/benchmark/pred.py", line 263, in get_pred
    output = searcher.generate(
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/greedy_search.py", line 32, in generate
    result = self._decode(input_ids, **kwargs)
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/greedy_search.py", line 54, in _decode
    out = self.model(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1065, in forward
    outputs = self.model(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/patch.py", line 102, in model_forward
    layer_outputs = decoder_layer(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 528, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ma-user/anaconda3/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yuanye/long_llm/InfLLM/inf_llm/utils/patch.py", line 16, in hf_forward
    ret = forward(
  File "/home/yuanye/long_llm/InfLLM/inf_llm/attention/origin.py", line 49, in forward
    score = torch.matmul(h_q, h_k.transpose(-1, -2))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1

It seems that inf_llm/attention/origin.py does not support GQA in mistral. How to fix it?

Nov 08 '24 03:11 yuanyehome

Hi, you can add repeat_kv from inf_llm/attention/utils.py before the qk computation.

Nov 08 '24 03:11 guyan364

Hi, you can add repeat_kv from inf_llm/attention/utils.py before the qk computation.

I've found it. Thanks a lot!

Nov 08 '24 03:11 yuanyehome