text-generation-webui
text-generation-webui copied to clipboard
GPT4 x Alpaca
Describe the bug
Error using the model: gpt-x-alpaca-13b-native-4bit-128g-cuda.pt
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Clone repository from "https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g" (in models path) Download gpt-x-alpaca-13b-native-4bit-128g-cuda.pt into the file (8 gb) Start webUI (Need to start with arguments: --wbits 4 --groupsize 128) or another error occurs (no file found named: pytorch_model.bin error)
The UI opens but when I try to write something, it gots deleted and an error occurs in the command window.
Screenshot
No response
Logs
To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
File "D:\AIChatUI\text-generation-webui\modules\callbacks.py", line 66, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "D:\AIChatUI\text-generation-webui\modules\text_generation.py", line 220, in generate_with_callback
shared.model.generate(**kwargs)
File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
outputs = self.model(
File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\AIChatUI\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
quant_cuda.vecquant4matmul(x.float(), self.qweight, out, self.scales.float(), zeros.float(), self.g_idx)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None
Invoked with: tensor([[ 0.0120, -0.0418, 0.2859, ..., -0.0147, 0.0010, 0.0085],
[ 0.0258, 0.0073, 0.0054, ..., 0.0151, -0.0106, 0.0096],
[-0.0103, 0.0106, 0.0076, ..., 0.0028, 0.0162, 0.0219],
...,
[ 0.0081, 0.0318, 0.0299, ..., -0.0010, 0.0168, 0.0154],
[ 0.0273, -0.0054, 0.0301, ..., -0.0171, 0.0634, 0.0044],
[-0.0064, -0.0233, -0.0635, ..., 0.0368, -0.0215, 0.0078]],
device='cuda:0'), tensor([[-1398026309, 1248439994, 1968657271, ..., 1648788836,
1503146616, 1432982596],
[-1129530164, -1402222200, 1685349974, ..., 2016756323,
900172105, -2007726747],
[ -876888900, -1735723399, 1717986149, ..., -1236974524,
1117231658, -1988663128],
...,
[ 2040244922, 442721970, -1501410730, ..., -1466332823,
1110137158, -878212453],
[-1196906615, 2052409206, 1768056949, ..., 2126071976,
1109693461, -611755894],
[ 1735818137, -1669052488, -1469479036, ..., 1616880563,
1484147029, -931563094]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0145, 0.0080, 0.0104, ..., 0.0184, 0.0109, 0.0129],
[0.0069, 0.0054, 0.0059, ..., 0.0165, 0.0122, 0.0090],
[0.0114, 0.0084, 0.0121, ..., 0.0273, 0.0139, 0.0107],
...,
[0.0121, 0.0058, 0.0124, ..., 0.0211, 0.0107, 0.0137],
[0.0138, 0.0066, 0.0162, ..., 0.0180, 0.0140, 0.0112],
[0.0087, 0.0062, 0.0055, ..., 0.0152, 0.0119, 0.0112]],
device='cuda:0'), tensor([[0.1592, 0.0716, 0.0623, ..., 0.0922, 0.0763, 0.0777],
[0.0621, 0.0433, 0.0529, ..., 0.1486, 0.0730, 0.0538],
[0.0568, 0.0923, 0.1205, ..., 0.1641, 0.1248, 0.0749],
...,
[0.1088, 0.0405, 0.0622, ..., 0.2322, 0.0753, 0.1096],
[0.0551, 0.0530, 0.1946, ..., 0.1259, 0.0983, 0.0893],
[0.0787, 0.0436, 0.0327, ..., 0.1062, 0.0717, 0.0900]],
device='cuda:0'), tensor([ 0, 0, 0, ..., 39, 39, 39], device='cuda:0', dtype=torch.int32)
Output generated in 1.03 seconds (0.00 tokens/s, 0 tokens, context 43)
System Info
Windows 10
GEFORCE RTX 4090 GIGABYTE
don't quote me on this but I think either your groupsize is off or you don't have gptq setup right... I think...
Does anyone can run this model with oobabooga now? I saw a note on huggingface model card that says the model is currently incompatible with Oobabooga? Or...am I wrong?
I have the same issue. using windows 10 and a 24gb vram card.
It runs fine on my system. GPU RTX 3060 12GB 16GB RAM CPU Ryzen 5 5600G
Fine on mine (4090RTX, 96GB RAM, i9-10900K)- need to delete one of the PT model files (the non-Cuda one)
These are the files I have. And I still need a solution...
Runs fine on my 2080 ti system, aside from lacking the VRAM to really push the context length, but has this error on my M40.
Fine on mine (4090RTX, 96GB RAM, i9-10900K)- need to delete one of the PT model files (the non-Cuda one)
Trying almost the same environment, however, it keeps attempting to load using RAM instead of VRAM.
Same problem with all .pt models:
Invoked with: tensor([[ 0.0097, -0.0423, 0.2747, ..., -0.0144, 0.0021, 0.0083], [ 0.0030, 0.0149, -0.0147, ..., 0.0014, -0.0061, -0.0047], [-0.0071, 0.0223, -0.0016, ..., -0.0235, 0.0400, -0.0051], ..., [-0.0144, -0.0331, 0.0077, ..., 0.0066, 0.0400, -0.0034], [-0.0100, 0.0279, 0.0171, ..., 0.0138, -0.0362, -0.0109], [-0.0053, -0.0219, -0.0596, ..., 0.0373, -0.0200, 0.0070]], device='cuda:0'), tensor([[-1398026309, 1248436154, 1968657271, ..., 1648788836, 1503146616, 1432982596], [-1146307380, -1418999416, 1702123094, ..., 2016756323, 631736649, -2007726747], [ -876888644, -1735723655, 1449550693, ..., -1236974524, 1116183082, -1988663128], ..., [-1634813764, 730166963, -1570613979, ..., -1448437126, 1126914374, -610817348], [-1719031703, -1418118713, 928405381, ..., -1395955303, 1094030373, -895981414], [ 1502051209, -1096394793, -1219852926, ..., 1887417526, 1484150852, -914646341]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0., ..., 0., 0., 0.],
I am receiving the same arror with this model: ethzanalytics/RedPajama-INCITE-Chat-3B-v1-GPTQ-4bit-128g
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.