airllm
airllm copied to clipboard
RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1
System Specs: Ryzen 5600G Nvidia Tesla M40 24GB 128GB DDR4 RAM
Error:
running layers(cuda:0): 1%|▍ | 1/129 [00:06<14:44, 6.91s/it]
Traceback (most recent call last):
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\inference_405B_4bit.py", line 14, in <module>
generation_output = model.generate(
^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\generation\utils.py", line 2024, in generate
result = self._sample(
^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\airllm\airllm_base.py", line 369, in __call__
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\airllm\airllm_base.py", line 569, in forward
new_seq = layer(seq, **kwargs)[0]
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 734, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 622, in forward
key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240
My inference code:
from airllm import AutoModel
model = AutoModel.from_pretrained("unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit", delete_original=True)
input_text = input("Prompt the all mightly 405B Llama: ")
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=128,
padding=False)
generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=10,
return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
PIP List:
(NeuralSliceEnv) C:\Users\Darkl\OneDrive\source\repos\NeuralSlice>pip list
Package Version
------------------ ------------
accelerate 0.33.0
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiosignal 1.3.1
airllm 2.10.2
attrs 24.2.0
bitsandbytes 0.43.3
certifi 2024.7.4
charset-normalizer 3.3.2
colorama 0.4.6
coloredlogs 15.0.1
datasets 2.21.0
dill 0.3.8
filelock 3.15.4
frozenlist 1.4.1
fsspec 2024.6.1
huggingface-hub 0.24.6
humanfriendly 10.0
idna 3.8
Jinja2 3.1.4
MarkupSafe 2.1.5
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.3
numpy 1.26.4
optimum 1.21.4
packaging 24.1
pandas 2.2.2
pillow 10.2.0
pip 24.2
protobuf 5.27.3
psutil 6.0.0
pyarrow 17.0.0
pyreadline3 3.4.1
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.2
regex 2024.7.24
requests 2.32.3
safetensors 0.4.4
scipy 1.14.1
sentencepiece 0.2.0
setuptools 65.5.0
six 1.16.0
sympy 1.13.2
tokenizers 0.19.1
torch 2.4.0+cu121
torchaudio 2.4.0+cu121
torchvision 0.19.0+cu121
tqdm 4.66.5
transformers 4.44.2
typing_extensions 4.12.2
tzdata 2024.1
urllib3 2.2.2
xxhash 3.5.0
yarl 1.9.4