airllm icon indicating copy to clipboard operation
airllm copied to clipboard

RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1

Open TitleOS opened this issue 5 months ago • 3 comments

System Specs: Ryzen 5600G Nvidia Tesla M40 24GB 128GB DDR4 RAM

Error:

running layers(cuda:0):   1%|▍                                                         | 1/129 [00:06<14:44,  6.91s/it]
Traceback (most recent call last):
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\inference_405B_4bit.py", line 14, in <module>
    generation_output = model.generate(
                        ^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\generation\utils.py", line 2024, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\airllm\airllm_base.py", line 369, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\airllm\airllm_base.py", line 569, in forward
    new_seq = layer(seq, **kwargs)[0]
              ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 734, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 622, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240

My inference code:

from airllm import AutoModel

model = AutoModel.from_pretrained("unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit", delete_original=True)

input_text = input("Prompt the all mightly 405B Llama: ")

input_tokens = model.tokenizer(input_text,
      return_tensors="pt", 
      return_attention_mask=False, 
      truncation=True, 
      max_length=128, 
      padding=False)

generation_output = model.generate(
      input_tokens['input_ids'].cuda(), 
      max_new_tokens=10,
      return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

PIP List:

(NeuralSliceEnv) C:\Users\Darkl\OneDrive\source\repos\NeuralSlice>pip list
Package            Version
------------------ ------------
accelerate         0.33.0
aiohappyeyeballs   2.4.0
aiohttp            3.10.5
aiosignal          1.3.1
airllm             2.10.2
attrs              24.2.0
bitsandbytes       0.43.3
certifi            2024.7.4
charset-normalizer 3.3.2
colorama           0.4.6
coloredlogs        15.0.1
datasets           2.21.0
dill               0.3.8
filelock           3.15.4
frozenlist         1.4.1
fsspec             2024.6.1
huggingface-hub    0.24.6
humanfriendly      10.0
idna               3.8
Jinja2             3.1.4
MarkupSafe         2.1.5
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.3
numpy              1.26.4
optimum            1.21.4
packaging          24.1
pandas             2.2.2
pillow             10.2.0
pip                24.2
protobuf           5.27.3
psutil             6.0.0
pyarrow            17.0.0
pyreadline3        3.4.1
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.2
regex              2024.7.24
requests           2.32.3
safetensors        0.4.4
scipy              1.14.1
sentencepiece      0.2.0
setuptools         65.5.0
six                1.16.0
sympy              1.13.2
tokenizers         0.19.1
torch              2.4.0+cu121
torchaudio         2.4.0+cu121
torchvision        0.19.0+cu121
tqdm               4.66.5
transformers       4.44.2
typing_extensions  4.12.2
tzdata             2024.1
urllib3            2.2.2
xxhash             3.5.0
yarl               1.9.4

TitleOS avatar Aug 31 '24 00:08 TitleOS