text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

GPT4 x Alpaca

Open Ege-P opened this issue 1 year ago • 9 comments

Describe the bug

Error using the model: gpt-x-alpaca-13b-native-4bit-128g-cuda.pt

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

Clone repository from "https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g" (in models path) Download gpt-x-alpaca-13b-native-4bit-128g-cuda.pt into the file (8 gb) Start webUI (Need to start with arguments: --wbits 4 --groupsize 128) or another error occurs (no file found named: pytorch_model.bin error)

The UI opens but when I try to write something, it gots deleted and an error occurs in the command window.

Screenshot

No response

Logs

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "D:\AIChatUI\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\AIChatUI\text-generation-webui\modules\text_generation.py", line 220, in generate_with_callback
    shared.model.generate(**kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x.float(), self.qweight, out, self.scales.float(), zeros.float(), self.g_idx)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None

Invoked with: tensor([[ 0.0120, -0.0418,  0.2859,  ..., -0.0147,  0.0010,  0.0085],
        [ 0.0258,  0.0073,  0.0054,  ...,  0.0151, -0.0106,  0.0096],
        [-0.0103,  0.0106,  0.0076,  ...,  0.0028,  0.0162,  0.0219],
        ...,
        [ 0.0081,  0.0318,  0.0299,  ..., -0.0010,  0.0168,  0.0154],
        [ 0.0273, -0.0054,  0.0301,  ..., -0.0171,  0.0634,  0.0044],
        [-0.0064, -0.0233, -0.0635,  ...,  0.0368, -0.0215,  0.0078]],
       device='cuda:0'), tensor([[-1398026309,  1248439994,  1968657271,  ...,  1648788836,
          1503146616,  1432982596],
        [-1129530164, -1402222200,  1685349974,  ...,  2016756323,
           900172105, -2007726747],
        [ -876888900, -1735723399,  1717986149,  ..., -1236974524,
          1117231658, -1988663128],
        ...,
        [ 2040244922,   442721970, -1501410730,  ..., -1466332823,
          1110137158,  -878212453],
        [-1196906615,  2052409206,  1768056949,  ...,  2126071976,
          1109693461,  -611755894],
        [ 1735818137, -1669052488, -1469479036,  ...,  1616880563,
          1484147029,  -931563094]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0145, 0.0080, 0.0104,  ..., 0.0184, 0.0109, 0.0129],
        [0.0069, 0.0054, 0.0059,  ..., 0.0165, 0.0122, 0.0090],
        [0.0114, 0.0084, 0.0121,  ..., 0.0273, 0.0139, 0.0107],
        ...,
        [0.0121, 0.0058, 0.0124,  ..., 0.0211, 0.0107, 0.0137],
        [0.0138, 0.0066, 0.0162,  ..., 0.0180, 0.0140, 0.0112],
        [0.0087, 0.0062, 0.0055,  ..., 0.0152, 0.0119, 0.0112]],
       device='cuda:0'), tensor([[0.1592, 0.0716, 0.0623,  ..., 0.0922, 0.0763, 0.0777],
        [0.0621, 0.0433, 0.0529,  ..., 0.1486, 0.0730, 0.0538],
        [0.0568, 0.0923, 0.1205,  ..., 0.1641, 0.1248, 0.0749],
        ...,
        [0.1088, 0.0405, 0.0622,  ..., 0.2322, 0.0753, 0.1096],
        [0.0551, 0.0530, 0.1946,  ..., 0.1259, 0.0983, 0.0893],
        [0.0787, 0.0436, 0.0327,  ..., 0.1062, 0.0717, 0.0900]],
       device='cuda:0'), tensor([ 0,  0,  0,  ..., 39, 39, 39], device='cuda:0', dtype=torch.int32)
Output generated in 1.03 seconds (0.00 tokens/s, 0 tokens, context 43)

System Info

Windows 10
GEFORCE RTX 4090 GIGABYTE

Ege-P avatar Apr 17 '23 22:04 Ege-P

don't quote me on this but I think either your groupsize is off or you don't have gptq setup right... I think...

practical-dreamer avatar Apr 17 '23 23:04 practical-dreamer

Does anyone can run this model with oobabooga now? I saw a note on huggingface model card that says the model is currently incompatible with Oobabooga? Or...am I wrong?

I have the same issue. using windows 10 and a 24gb vram card.

shawhu avatar Apr 18 '23 10:04 shawhu

It runs fine on my system. GPU RTX 3060 12GB 16GB RAM CPU Ryzen 5 5600G

kuso-ge avatar Apr 18 '23 11:04 kuso-ge

Fine on mine (4090RTX, 96GB RAM, i9-10900K)- need to delete one of the PT model files (the non-Cuda one)

desva avatar Apr 18 '23 22:04 desva

gpt4-x-alpaca-13b-native-4bit-128g 19 04 2023 01_35_05

These are the files I have. And I still need a solution...

Ege-P avatar Apr 18 '23 22:04 Ege-P

Runs fine on my 2080 ti system, aside from lacking the VRAM to really push the context length, but has this error on my M40.

Pathos14489 avatar Apr 19 '23 08:04 Pathos14489

Fine on mine (4090RTX, 96GB RAM, i9-10900K)- need to delete one of the PT model files (the non-Cuda one)

Trying almost the same environment, however, it keeps attempting to load using RAM instead of VRAM.

Ty-lerCox avatar Apr 19 '23 21:04 Ty-lerCox

Same problem with all .pt models:

Invoked with: tensor([[ 0.0097, -0.0423, 0.2747, ..., -0.0144, 0.0021, 0.0083], [ 0.0030, 0.0149, -0.0147, ..., 0.0014, -0.0061, -0.0047], [-0.0071, 0.0223, -0.0016, ..., -0.0235, 0.0400, -0.0051], ..., [-0.0144, -0.0331, 0.0077, ..., 0.0066, 0.0400, -0.0034], [-0.0100, 0.0279, 0.0171, ..., 0.0138, -0.0362, -0.0109], [-0.0053, -0.0219, -0.0596, ..., 0.0373, -0.0200, 0.0070]], device='cuda:0'), tensor([[-1398026309, 1248436154, 1968657271, ..., 1648788836, 1503146616, 1432982596], [-1146307380, -1418999416, 1702123094, ..., 2016756323, 631736649, -2007726747], [ -876888644, -1735723655, 1449550693, ..., -1236974524, 1116183082, -1988663128], ..., [-1634813764, 730166963, -1570613979, ..., -1448437126, 1126914374, -610817348], [-1719031703, -1418118713, 928405381, ..., -1395955303, 1094030373, -895981414], [ 1502051209, -1096394793, -1219852926, ..., 1887417526, 1484150852, -914646341]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0., ..., 0., 0., 0.],

CRCODE22 avatar Apr 21 '23 04:04 CRCODE22

I am receiving the same arror with this model: ethzanalytics/RedPajama-INCITE-Chat-3B-v1-GPTQ-4bit-128g

bubbabug avatar May 07 '23 21:05 bubbabug

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Aug 30 '23 23:08 github-actions[bot]